| # Kernel track events: format and conventions |
| |
| This page describes a convention for structuring Linux kernel tracepoints in a |
| way that enables perfetto to automatically present them as slice/counter tracks |
| at the UI and SQL levels, without having to change or rebuild perfetto code. |
| |
| This is a perfetto convention, and does not have (or need) any dedicated |
| upstream kernel code. It's best used when hacking on a local kernel, or writing |
| a self-contained module that won't be upstreamed. It is also not explicitly |
| tied to static tracepoints, a dynamic probe (e.g. kprobe) that creates a |
| `tracefs` entry with the relevant fields will also work. |
| |
| This page is structured as a reference, an introduction with **examples and |
| screenshots** of resulting UI is at ["Intrumenting the Linux kernel with |
| ftrace"][ftrace-intro-link]. |
| |
| [ftrace-intro-link]: /docs/getting-started/ftrace#part-c-simple-slice-counter-visualisations-without-modifying-perfetto-code-kernel-track-events- |
| |
| *This convention is still malleable, if you end up using it and/or finding |
| issues with the design, please send an email to our mailing list or file a |
| github issue.* |
| |
| ## Slices and instants |
| |
| Perfetto looks for fields with specific types and names in the event's data |
| representation. This is defined by `TP_STRUCT__entry()` when using the |
| `TRACE_EVENT()` macro to define the tracepoint. |
| |
| For representing slices (begin + end) and instants, grouped by tracks, the |
| well-known fields are: |
| |
| | required? | type | name | |
| | --- | --- | --- | |
| | required | char | track\_event\_type | |
| | required | \_\_string | slice\_name | |
| | optional | intX | scope\_{...} | |
| | optional | \_\_string | track\_name | |
| |
| Where `intX` represents any integral type, and `__string` is the kernel type |
| used for storing dynamically-sized strings in tracing events. |
| |
| At runtime, the event payloads will be interpreted as follows: |
| |
| * `track_event_type`: |
| * `'B'` opens a named slice. |
| * `'E'` ends the last opened slice within the track. |
| * `'I'` sets a named instant (zero duration) event. |
| |
| * `slice_name`: the name of the slice for begin ('B') and instant ('I') events, |
| ignored for end events. |
| |
| * `track_name`: if set, overrides the track's name. The default is the |
| tracepoint's name. |
| |
| * `scope_{...}`: if set, specifies the scoping id of the track, which is used |
| as a grouping key for the tracks. The field name can have an arbitrary suffix |
| that makes sense within your subsystem, but there are also a few well-known |
| names that perfetto can use as a hint when presenting the tracks in the UI. |
| The id does not have to be related to an OS-level concept. |
| * `scope_tgid`: for process-scoped tracks, where the value must be of a valid |
| process (though the calling thread does not need to be within that process). |
| * `scope_cpu`: for cpu-scoped tracks (emitting code does not need to be |
| running on that cpu). |
| * `scope_your_feature_idx`: for your own track id assignments. |
| * *default*: thread-scoped track (using the thread id of the thread hitting |
| the tracepoint, as recorded by the ftrace system itself). |
| |
| Additionally: |
| |
| The tracepoint name and the subsystem can be arbitrary. Your headers can |
| declare an arbitrary amount of tracepoints that match these templates. Each |
| tracepoint will be processed indepdendently. |
| |
| There are no constraints on having additional fields, the field order or other |
| parts of the `TRACE_EVENT()` declaration. Note that this includes the printk |
| specifier, so the textual formatting of the tracepoint can be arbitrary (you |
| don't even need to print the perfetto-specific fields). |
| |
| ## Counters |
| |
| For representing counter values, grouped by tracks, the well-known fields are: |
| |
| | required? | type | name | |
| | --- | --- | --- | |
| | required | intX | counter\_value | |
| | optional | intX | scope\_{...} | |
| | optional | \_\_string | track\_name | |
| |
| ## Details on scoping (grouping) events |
| |
| This section explains the rules of how the recorded events get grouped into |
| tracks, as generally a trace recorded using a single tracepoint can result in N |
| separate tracks. The grouping rules are the same for slice and counter tracks. |
| |
| **NB:** slices on slice tracks *must* have strict nesting - all slices must |
| terminate before their parents (see the concept of [async |
| slices][async-slice-link] for more details). You need to use track naming or |
| scoping to ensure that that invariant is preserved. |
| |
| The default behaviour (if you only specify the mandatory fields) is |
| thread-scoped. Events are grouped by the thread id of the thread(s) hitting the |
| tracepoints. There will be one track per thread with events. The end ('E') |
| events will terminate the last opened slice on that thread. |
| |
| If the event has a field prefixed with `scope_`, the events will be grouped by |
| the value of that field, with some predefined names having special meaning (see |
| above). For example, if you specify a `scope_tgid`, that turns the track |
| process-scoped - all events sharing the same `scope_tgid` value will be put on |
| the same track. Further, the UI will present that track in the process' group. |
| |
| If your events include the `track_name` field, then events become grouped by |
| that name as an additional dimension to the above. That is, the end ('E') event |
| will terminate the last opened slice with that exact track name, even if there |
| are multiple named tracks within the same thread/process/cpu/etc scope. |
| |
| The net effect is that recorded events are grouped by the unique combination |
| of: `{tracepoint} x {track name} x {scope id}`. With the last two defaulting to |
| the tracepoint name and thread id respectively. |
| |