NOTE: This doc is designed for administrators of Bigtrace services NOT Bigtrace users. This is also designed for non-Googlers - Googlers should look at go/bigtrace
instead.
Bigtrace is a tool which facilitates the processing of traces in the O(million) by distributing instances of TraceProcessor across a Kubernetes cluster.
The design of Bigtrace consists of four main parts:
There are three clients to interact with Bigtrace: a Python API, clickhouse-client and Apache Superset.
The Orchestrator is the central component of the service and is responsible for sharding traces to the various Worker pods and streaming the results to the Client.
Each Worker runs an instance of TraceProcessor and performs the inputted query on a given trace. Each Worker runs on its own pod in the cluster.
The object store contains the set of traces the service can query from and is accessed by the Worker. Currently, there is support for GCS as the main object store and the loading of traces stored locally on each machine for testing.
Additional integrations can be added by creating a new repository policy in src/bigtrace/worker/repository_policies.
The recommended way to deploy Bigtrace is on Google Kubernetes Engine and this guide will explain the process.
Prerequisites:
In addition to the default API access of the Compute Engine service account, the following permissions are required:
These can be added on GCP through IAM & Admin > IAM > Permissions.
To use kubectl to apply the yaml files for deployments and services you must first connect and authenticate with the cluster.
You can follow these instructions on device or in cloud shell using the following command:
gcloud container clusters get-credentials [CLUSTER_NAME] --zone [ZONE]--project [PROJECT_NAME]
The deployment of Orchestrator requires two main steps: Building and pushing the images to Artifact Registry & deploying to the cluster.
To build the image and push to Artifact Registry, first navigate to the perfetto directory and then run the following commands:
docker build -t bigtrace_orchestrator src/bigtrace/orchestrator docker tag bigtrace_orchestrator [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/bigtrace_orchestrator docker push [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/bigtrace_orchestrator
To use the images from the registry which were built in the previous step, the orchestrator-deployment.yaml file must be modified to replace the line.
image: [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/bigtrace_orchestrator
The CPU resources should also be set depending on the vCPUs per pod as chosen before.
resources: requests: cpu: [VCPUS_PER_MACHINE] limits: cpu: [VCPUS_PER_MACHINE]
Then to deploy the Orchestrator you apply both the orchestrator-deployment.yaml and the orchestrator-ilb.yaml, for the deployment and internal load balancing service respectively.
kubectl apply -f orchestrator-deployment.yaml kubectl apply -f orchestrator-ilb.yaml
This deploys the Orchestrator as a single replica in a pod and exposes it as a service for access within the VPC by the client.
Similar to the Orchestrator first build and push the images to Artifact Registry.
docker build -t bigtrace_worker src/bigtrace/worker docker tag bigtrace_worker [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/bigtrace_worker docker push [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/bigtrace_worker
Then modify the yaml files to reflect the image as well as fit the required configuration for the use case.
image: [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/bigtrace_worker ... replicas: [DESIRED_REPLICA_COUNT] ... resources: requests: cpu: [VCPUS_PER_MACHINE]
Then deploy the deployment and service as follows:
kubectl apply -f worker-deployment.yaml kubectl apply -f worker-service.yaml
This image builds on top of the base Clickhouse image and provides the necessary Python libraries for gRPC to communicate with the Orchestrator.
docker build -t clickhouse src/bigtrace_clickhouse docker tag clickhouse [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/clickhouse docker push [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/clickhouse
To deploy this on a pod in a cluster, the provided yaml files must be applied using kubectl e.g.
kubectl apply -f src/bigtrace_clickhouse/config.yaml kubectl apply -f src/bigtrace_clickhouse/pvc.yaml kubectl apply -f src/bigtrace_clickhouse/pv.yaml kubectl apply -f src/bigtrace_clickhouse/clickhouse-deployment.yaml kubectl apply -f src/bigtrace_clickhouse/clickhouse-ilb.yaml
With the clickhouse-deployment.yaml you must replace the image variable with the URI to the image built in the previous step - which contains the Clickhouse image with the necessary Python files for gRPC installed on top.
The env variable BIGTRACE_ORCHESTRATOR_ADDRESS must also be changed to the address of the Orchestrator service given by GKE:
containers: - name: clickhouse image: # [ZONE]-docker.pkg.dev/[PROJECT_NAME]/[REPO_NAME]/clickhouse env: - name: BIGTRACE_ORCHESTRATOR_ADDRESS value: # Address of Orchestrator service
Contains the image of the Clickhouse server and configures the necessary volumes and resources.
This Internal Load Balancer is used to allow for the Clickhouse server pod to be reached from within the VPC in GKE. This means that VMs outside the cluster are able to access the Clickhouse server through Clickhouse Client, without exposing the service to the public.
These files create the volumes needed for the Clickhouse server to persist the databases in the event of pod failure.
This is where Clickhouse config files can be specified to customize the server to the user's requirements. (https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings)
You can deploy Clickhouse in a variety of ways by following: https://clickhouse.com/docs/en/install
When running the client through CLI it is important to specify: ./clickhouse client --host [ADDRESS] --port [PORT] --receive-timeout=1000000 --send-timeout=100000 --idle_connection_timeout=1000000
There are two methods of deploying Superset - one for development and one for production.
You can deploy an instance of Superset within a VM for development by following: https://superset.apache.org/docs/quickstart
You can deploy a production ready instance on Kubernetes across pods by following: https://superset.apache.org/docs/installation/kubernetes
Superset can then be connected to Clickhouse via clickhouse-connect by following the instructions at this link, but replacing the first step with the connection details of the deployment: https://clickhouse.com/docs/en/integrations/superset