commit | a9f287ea4bd3f01cc10c5d56c3de5965fd133726 | [log] [tgz] |
---|---|---|
author | Ryan Savitski <rsavitski@google.com> | Tue Mar 19 02:36:23 2024 +0000 |
committer | Ryan Savitski <rsavitski@google.com> | Tue Mar 19 02:36:23 2024 +0000 |
tree | 68065660c3a59012c317ff08125967b8f8527f39 | |
parent | 7e63f42d755ca255162f1f327126f90e34465581 [diff] |
ftrace: write "last read event timestamp" to assist parsing of ring buffer traces -- Problem: -- Ftrace data is recorded per-cpu, grouping series of events from a given per-cpu kernel buffer into an ftrace_event_bundle packet. When a perfetto trace uses RING_BUFFER, the oldest trace packet wrapping can leave us with very imbalanced per-cpu state as entire bundles of events are overwritten. This in turn is problematic when interpreting certain ftrace events requires pairing them up (e.g. slice begin/end) or otherwise tracking transitions, and the related events are not constrained to a single cpu. Consider the following scenario of a {begin, end} pair of events that represent a tracing slice. Assuming there's one bundle per cpu: cpu0: [e...E..e..e] <- bundle ^event end cpu1: [.B.....e..eee.] <- bundle ^event begin It is possible for us to observe cpu1's events, while cpu0's have been overwritten in the ring buffer. In such a case, a naive parsing of all events would conclude that the slice began and never ended. The above is just the simplest example, things get more complicated once we start reasoning about different kernel tracing bandwidth across cpus and userspace transcoding breaking up reads in terms of KBs of tracing data instead of timestamps. -- Solution: -- Since traced_probes is the exclusive reader of per-cpu kernel ring buffers, it knows the latest "consumed" event's timestamp for that buffer. By putting this timestamp into the *next* bundle, we're recording that "all ftrace data in this bundle is guaranteed to cover the time range starting from where the last read stopped". Example sketch: Initial state of the per-cpu ring buffers: cpu0: .eeee.e. cpu1: ee...ee. cpu2: ..ee.ee. After a read pass, some data has been consumed from each buf: cpu0: XXXee.e. cpu1: XXXXXXe. cpu2: XXXe.ee. Now, on the *next* bundle per cpu, we'll be emitting how far the consumed ('X') area extends. In turn, from the trace_processor perspective, we can drop events until the earliest timestamp covered by all per-cpu event bundles. This is the maximum of all per-cpu "valid from" timestamps. Bug: 192586066 Change-Id: Iebc6952a2f2a6a399b8cde64f210e28a69c04111
Perfetto is a production-grade open-source stack for performance instrumentation and trace analysis. It offers services and libraries and for recording system-level and app-level traces, native + java heap profiling, a library for analyzing traces using SQL and a web-based UI to visualize and explore multi-GB traces.
See https://perfetto.dev/docs or the /docs/ directory for documentation.