|  | # Batch Trace Processor | 
|  |  | 
|  | _The Batch Trace Processor is a Python library wrapping the | 
|  | [Trace Processor](/docs/analysis/trace-processor.md): it allows fast (<1s) | 
|  | interactive queries on large sets (up to ~1000) of traces._ | 
|  |  | 
|  | ## Installation | 
|  |  | 
|  | Batch Trace Processor is part of the `perfetto` Python library and can be | 
|  | installed by running: | 
|  |  | 
|  | ```shell | 
|  | pip3 install pandas       # prerequisite for Batch Trace Processor | 
|  | pip3 install perfetto | 
|  | ``` | 
|  |  | 
|  | ## Loading traces | 
|  | NOTE: if you are a Googler, have a look at | 
|  | [go/perfetto-btp-load-internal](http://goto.corp.google.com/perfetto-btp-load-internal) for how to load traces from Google-internal sources. | 
|  |  | 
|  | The simplest way to load traces in is by passing a list of file paths to load: | 
|  | ```python | 
|  | from perfetto.batch_trace_processor.api import BatchTraceProcessor | 
|  |  | 
|  | files = [ | 
|  | 'traces/slow-start.pftrace', | 
|  | 'traces/oom.pftrace', | 
|  | 'traces/high-battery-drain.pftrace', | 
|  | ] | 
|  | with BatchTraceProcessor(files) as btp: | 
|  | btp.query('...') | 
|  | ``` | 
|  |  | 
|  | [glob](https://docs.python.org/3/library/glob.html) can be used to load | 
|  | all traces in a directory: | 
|  | ```python | 
|  | from perfetto.batch_trace_processor.api import BatchTraceProcessor | 
|  |  | 
|  | files = glob.glob('traces/*.pftrace') | 
|  | with BatchTraceProcessor(files) as btp: | 
|  | btp.query('...') | 
|  | ``` | 
|  |  | 
|  | NOTE: loading too many traces can cause out-of-memory issues: see | 
|  | [this](/docs/analysis/batch-trace-processor#memory-usage) section for details. | 
|  |  | 
|  | A common requirement is to load traces located in the cloud or by sending | 
|  | a request to a server. To support this usecase, traces can also be loaded | 
|  | using [trace URIs](/docs/analysis/batch-trace-processor#trace-uris): | 
|  | ```python | 
|  | from perfetto.batch_trace_processor.api import BatchTraceProcessor | 
|  | from perfetto.batch_trace_processor.api import BatchTraceProcessorConfig | 
|  | from perfetto.trace_processor.api import TraceProcessorConfig | 
|  | from perfetto.trace_uri_resolver.registry import ResolverRegistry | 
|  | from perfetto.trace_uri_resolver.resolver import TraceUriResolver | 
|  |  | 
|  | class FooResolver(TraceUriResolver): | 
|  | # See "Trace URIs" section below for how to implement a URI resolver. | 
|  |  | 
|  | config = BatchTraceProcessorConfig( | 
|  | # See "Trace URIs" below | 
|  | ) | 
|  | with BatchTraceProcessor('foo:bar=1,baz=abc', config=config) as btp: | 
|  | btp.query('...') | 
|  | ``` | 
|  |  | 
|  | ## Writing queries | 
|  | Writing queries with batch trace processor works very similarly to the | 
|  | [Python API](/docs/analysis/batch-trace-processor#python-api). | 
|  |  | 
|  | For example, to get a count of the number of userspace slices: | 
|  | ```python | 
|  | >>> btp.query('select count(1) from slice') | 
|  | [  count(1) | 
|  | 0  2092592,   count(1) | 
|  | 0   156071,   count(1) | 
|  | 0   121431] | 
|  | ``` | 
|  | The return value of `query` is a list of [Pandas](https://pandas.pydata.org/) | 
|  | dataframes, one for each trace loaded. | 
|  |  | 
|  | A common requirement is for all of the traces to be flattened into a | 
|  | single dataframe instead of getting one dataframe per-trace. To support this, | 
|  | the `query_and_flatten` function can be used: | 
|  | ```python | 
|  | >>> btp.query_and_flatten('select count(1) from slice') | 
|  | count(1) | 
|  | 0  2092592 | 
|  | 1   156071 | 
|  | 2   121431 | 
|  | ``` | 
|  |  | 
|  | `query_and_flatten` also implicitly adds columns indicating the originating | 
|  | trace. The exact columns added depend on the resolver being used: consult your | 
|  | resolver's documentation for more information. | 
|  |  | 
|  | ## Trace URIs | 
|  | Trace URIs are a powerful feature of the batch trace processor. URIs decouple | 
|  | the notion of "paths" to traces from the filesystem. Instead, the URI | 
|  | describes *how* a trace should be fetched (i.e. by sending a HTTP request | 
|  | to a server, from cloud storage etc). | 
|  |  | 
|  | The syntax of trace URIs are similar to web | 
|  | [URLs](https://en.wikipedia.org/wiki/URL). Formally a trace URI has the | 
|  | structure: | 
|  | ``` | 
|  | Trace URI = protocol:key1=val1(;keyn=valn)* | 
|  | ``` | 
|  |  | 
|  | As an example: | 
|  | ``` | 
|  | gcs:bucket=foo;path=bar | 
|  | ``` | 
|  | would indicate that traces should be fetched using the protocol `gcs` | 
|  | ([Google Cloud Storage](https://cloud.google.com/storage)) with traces | 
|  | located at bucket `foo` and path `bar` in the bucket. | 
|  |  | 
|  | NOTE: the `gcs` resolver is *not* actually included: it's simply given as its | 
|  | an easy to understand example. | 
|  |  | 
|  | URIs are only a part of the puzzle: ultimately batch trace processor still needs | 
|  | the bytes of the traces to be able to parse and query them. The job of | 
|  | converting URIs to trace bytes is left to *resolvers* - Python | 
|  | classes associated to each *protocol* and use the key-value pairs in the URI | 
|  | to lookup the traces to be parsed. | 
|  |  | 
|  | By default, batch trace processor only ships with a single resolver which knows | 
|  | how to lookup filesystem paths: however, custom resolvers can be easily | 
|  | created and registered. See the documentation on the | 
|  | [TraceUriResolver class](https://cs.android.com/android/platform/superproject/+/master:external/perfetto/python/perfetto/trace_uri_resolver/resolver.py;l=56?q=resolver.py) | 
|  | for information on how to do this. | 
|  |  | 
|  | ## Memory usage | 
|  | Memory usage is a very important thing to pay attention to working with batch | 
|  | trace processor. Every trace loaded lives fully in memory: this is magic behind | 
|  | making queries fast (<1s) even on hundreds of traces. | 
|  |  | 
|  | This also means that the number of traces you can load is heavily limited by | 
|  | the amount of memory available available. As a rule of thumb, if your | 
|  | average trace size is S and you are trying to load N traces, you will have | 
|  | 2 * S * N memory usage. Note that this can vary significantly based on the | 
|  | exact contents and sizes of your trace. | 
|  |  | 
|  | ## Advanced features | 
|  | ### Sharing computations between TP and BTP | 
|  | Sometimes it can be useful to parameterise code to work with either trace | 
|  | processor or batch trace processor. `execute` or `execute_and_flatten` | 
|  | can be used for this purpose: | 
|  | ```python | 
|  | def some_complex_calculation(tp): | 
|  | res = tp.query('...').as_pandas_dataframe() | 
|  | # ... do some calculations with res | 
|  | return res | 
|  |  | 
|  | # |some_complex_calculation| can be called with a [TraceProcessor] object: | 
|  | tp = TraceProcessor('/foo/bar.pftrace') | 
|  | some_complex_calculation(tp) | 
|  |  | 
|  | # |some_complex_calculation| can also be passed to |execute| or | 
|  | # |execute_and_flatten| | 
|  | btp = BatchTraceProcessor(['...', '...', '...']) | 
|  |  | 
|  | # Like |query|, |execute| returns one result per trace. Note that the returned | 
|  | # value *does not* have to be a Pandas dataframe. | 
|  | [a, b, c] = btp.execute(some_complex_calculation) | 
|  |  | 
|  | # Like |query_and_flatten|, |execute_and_flatten| merges the Pandas dataframes | 
|  | # returned per trace into a single dataframe, adding any columns requested by | 
|  | # the resolver. | 
|  | flattened_res = btp.execute_and_flatten(some_complex_calculation) | 
|  | ``` |