| # Batch Trace Processor |
| |
| _The Batch Trace Processor is a Python library wrapping the |
| [Trace Processor](/docs/analysis/trace-processor.md): it allows fast (<1s) |
| interactive queries on large sets (up to ~1000) of traces._ |
| |
| ## Installation |
| |
| Batch Trace Processor is part of the `perfetto` Python library and can be |
| installed by running: |
| |
| ```shell |
| pip3 install pandas # prerequisite for Batch Trace Processor |
| pip3 install perfetto |
| ``` |
| |
| ## Loading traces |
| NOTE: if you are a Googler, have a look at |
| [go/perfetto-btp-load-internal](http://goto.corp.google.com/perfetto-btp-load-internal) for how to load traces from Google-internal sources. |
| |
| The simplest way to load traces in is by passing a list of file paths to load: |
| ```python |
| from perfetto.batch_trace_processor.api import BatchTraceProcessor |
| |
| files = [ |
| 'traces/slow-start.pftrace', |
| 'traces/oom.pftrace', |
| 'traces/high-battery-drain.pftrace', |
| ] |
| with BatchTraceProcessor(files) as btp: |
| btp.query('...') |
| ``` |
| |
| [glob](https://docs.python.org/3/library/glob.html) can be used to load |
| all traces in a directory: |
| ```python |
| from perfetto.batch_trace_processor.api import BatchTraceProcessor |
| |
| files = glob.glob('traces/*.pftrace') |
| with BatchTraceProcessor(files) as btp: |
| btp.query('...') |
| ``` |
| |
| NOTE: loading too many traces can cause out-of-memory issues: see |
| [this](/docs/analysis/batch-trace-processor#memory-usage) section for details. |
| |
| A common requirement is to load traces located in the cloud or by sending |
| a request to a server. To support this usecase, traces can also be loaded |
| using [trace URIs](/docs/analysis/batch-trace-processor#trace-uris): |
| ```python |
| from perfetto.batch_trace_processor.api import BatchTraceProcessor |
| from perfetto.batch_trace_processor.api import BatchTraceProcessorConfig |
| from perfetto.trace_processor.api import TraceProcessorConfig |
| from perfetto.trace_uri_resolver.registry import ResolverRegistry |
| from perfetto.trace_uri_resolver.resolver import TraceUriResolver |
| |
| class FooResolver(TraceUriResolver): |
| # See "Trace URIs" section below for how to implement a URI resolver. |
| |
| config = BatchTraceProcessorConfig( |
| # See "Trace URIs" below |
| ) |
| with BatchTraceProcessor('foo:bar=1,baz=abc', config=config) as btp: |
| btp.query('...') |
| ``` |
| |
| ## Writing queries |
| Writing queries with batch trace processor works very similarly to the |
| [Python API](/docs/analysis/batch-trace-processor#python-api). |
| |
| For example, to get a count of the number of userspace slices: |
| ```python |
| >>> btp.query('select count(1) from slice') |
| [ count(1) |
| 0 2092592, count(1) |
| 0 156071, count(1) |
| 0 121431] |
| ``` |
| The return value of `query` is a list of [Pandas](https://pandas.pydata.org/) |
| dataframes, one for each trace loaded. |
| |
| A common requirement is for all of the traces to be flattened into a |
| single dataframe instead of getting one dataframe per-trace. To support this, |
| the `query_and_flatten` function can be used: |
| ```python |
| >>> btp.query_and_flatten('select count(1) from slice') |
| count(1) |
| 0 2092592 |
| 1 156071 |
| 2 121431 |
| ``` |
| |
| `query_and_flatten` also implicitly adds columns indicating the originating |
| trace. The exact columns added depend on the resolver being used: consult your |
| resolver's documentation for more information. |
| |
| ## Trace URIs |
| Trace URIs are a powerful feature of the batch trace processor. URIs decouple |
| the notion of "paths" to traces from the filesystem. Instead, the URI |
| describes *how* a trace should be fetched (i.e. by sending a HTTP request |
| to a server, from cloud storage etc). |
| |
| The syntax of trace URIs are similar to web |
| [URLs](https://en.wikipedia.org/wiki/URL). Formally a trace URI has the |
| structure: |
| ``` |
| Trace URI = protocol:key1=val1(;keyn=valn)* |
| ``` |
| |
| As an example: |
| ``` |
| gcs:bucket=foo;path=bar |
| ``` |
| would indicate that traces should be fetched using the protocol `gcs` |
| ([Google Cloud Storage](https://cloud.google.com/storage)) with traces |
| located at bucket `foo` and path `bar` in the bucket. |
| |
| NOTE: the `gcs` resolver is *not* actually included: it's simply given as its |
| an easy to understand example. |
| |
| URIs are only a part of the puzzle: ultimately batch trace processor still needs |
| the bytes of the traces to be able to parse and query them. The job of |
| converting URIs to trace bytes is left to *resolvers* - Python |
| classes associated to each *protocol* and use the key-value pairs in the URI |
| to lookup the traces to be parsed. |
| |
| By default, batch trace processor only ships with a single resolver which knows |
| how to lookup filesystem paths: however, custom resolvers can be easily |
| created and registered. See the documentation on the |
| [TraceUriResolver class](https://cs.android.com/android/platform/superproject/+/master:external/perfetto/python/perfetto/trace_uri_resolver/resolver.py;l=56?q=resolver.py) |
| for information on how to do this. |
| |
| ## Memory usage |
| Memory usage is a very important thing to pay attention to working with batch |
| trace processor. Every trace loaded lives fully in memory: this is magic behind |
| making queries fast (<1s) even on hundreds of traces. |
| |
| This also means that the number of traces you can load is heavily limited by |
| the amount of memory available available. As a rule of thumb, if your |
| average trace size is S and you are trying to load N traces, you will have |
| 2 * S * N memory usage. Note that this can vary significantly based on the |
| exact contents and sizes of your trace. |
| |
| ## Advanced features |
| ### Sharing computations between TP and BTP |
| Sometimes it can be useful to parameterise code to work with either trace |
| processor or batch trace processor. `execute` or `execute_and_flatten` |
| can be used for this purpose: |
| ```python |
| def some_complex_calculation(tp): |
| res = tp.query('...').as_pandas_dataframe() |
| # ... do some calculations with res |
| return res |
| |
| # |some_complex_calculation| can be called with a [TraceProcessor] object: |
| tp = TraceProcessor('/foo/bar.pftrace') |
| some_complex_calculation(tp) |
| |
| # |some_complex_calculation| can also be passed to |execute| or |
| # |execute_and_flatten| |
| btp = BatchTraceProcessor(['...', '...', '...']) |
| |
| # Like |query|, |execute| returns one result per trace. Note that the returned |
| # value *does not* have to be a Pandas dataframe. |
| [a, b, c] = btp.execute(some_complex_calculation) |
| |
| # Like |query_and_flatten|, |execute_and_flatten| merges the Pandas dataframes |
| # returned per trace into a single dataframe, adding any columns requested by |
| # the resolver. |
| flattened_res = btp.execute_and_flatten(some_complex_calculation) |
| ``` |