blob: 1167cf6488cf57f753ceba360b265bdc0fbfc121 [file] [log] [blame] [view] [edit]
# Learning to Read GPU Frame Captures
This is a gentle introduction to learning how one may go about reading a GPU
frame capture. This would be using a tool like the [Xcode GPU Frame
Debugger](https://developer.apple.com/documentation/metal/frame_capture_debugging_tools?language=objc),
[RenderDoc](https://renderdoc.org/), or [Android GPU
Inspector](https://gpuinspector.dev/). If you are already comfortable using one
or all of these tools, this introduction is likely too rudimentary for you. If
so, please skip this.
If you are working on Impeller (or any low-level graphics API for that matter),
it is unlikely you are going to get any work done without a frame debugger.
Fortunately, it is also extremely easy and fun. But it does require learning a
new skill-set.
I liken getting proficient at graphics debugging to learning how to drive. It is
absolutely a skill-set that must be learned. And, you get better at it the more
you practice.
The car you choose to learn to drive on really doesn’t matter. It may be gas or
electric, stick-shift or automatic. Admittedly, some cars are easier to learn to
drive on than others. But again, the car isn’t the point. The same holds for
graphics debuggers and the client rendering APIs. If you can read a GPU Frame
capture of a Vulkan frame on Windows using RenderDoc, you should be quickly able
to read a trace for a Metal Frame on iOS using Xcode. In fact, in a
cross-platform framework like Impeller, it is unlikely you are going to be able
to get away with using just one debugger. Like cars, all of them have their own
quirks and use-cases with no one-size-fits all solution.
# Start in an Empty Parking Lot
You wouldn’t start learning to drive on a busy freeway or city street. So, if
you immediately open a frame trace of a very complicated application, you are
likely to get overwhelmed.
Start with a frame that renders absolutely nothing. You are only figuring out
what the pedals in the car are and what the gauges mean. We are going to be
using Xcode in this tutorial in case you are following along. But again, the car
doesn’t matter.
Make sure you have already set up an Xcode session by following the instructions
in the wiki. Setup a test that opens a blank playground. With the playground
running, click on the stylized `M` to capture a Metal frame.
![alt_text](https://raw.githubusercontent.com/flutter/assets-for-api-docs//5da33067f5cfc7f177d9c460d618397aad9082ca/assets/engine/impeller/read_frame_captures/image1.avif "image_tooltip")
Give Xcode a few seconds to capture the frame and show the frame overview.
![alt_text](https://raw.githubusercontent.com/flutter/assets-for-api-docs//5da33067f5cfc7f177d9c460d618397aad9082ca/assets/engine/impeller/read_frame_captures/image2.avif "image_tooltip")
Let’s figure out what the gauges of this car mean.
* In box `4`, the overview shows that there are no draw calls and one command
buffer with one render command encoder. This is for the playground to render
the blank screen with the clear-color.
* The playground renders a dark slate gray clear color as it was adequately
contrasty with the primary colors and also black and white.
* Box `2` shows the Metal calls made grouped by the API call. If you click on
the Group by API Call dropdown, you can group the calls made according to the
pipeline state. But we have no draw calls remember, so this will be empty.
But, realize that in a more complicated application where you are looking for
a single class of draw calls, viewing by the pipeline state is going to be
more useful.
* When grouping by the API call, absolutely all calls made to the Metal API will
be shown in the sidebar. Most of them are not particularly interesting. These
include calls to allocate memory, create command buffers, set labels, etc.. To
whittle down this list to a (potentially) more interesting subset, click the
little flag at the bottom (see box `8`). But, if you ever find a call you were
looking for is not present in the sidebar, you may have filtered it away.
* Box `5` shows frame performance. But there is nothing to show as we are not
rendering anything. We’ll come back to this later.
* Box `6` shows the graphics memory overview. We’ll revisit this in detail later
too. But, it is a good idea to see what memory rendering a blank slate needs.
Realize that all graphics memory is not equal and learning when to use one vs
the other can lead to some interesting performance improvements.
* Box `7` is Xcodes attempt at showing you how you can improve performance.
These are just insights though and not warnings or errors. But, in every
frame, try to understand and reason about each insight to see if action is
necessary. In most cases, you can address these insights fairly easily. In the
example above, there are three insights. Lets reason about them:
* There are two insights for the late creation of two textures. From the
names of the textures, you can tell that one is the texture used for the
stencil buffer and another the color texture used for the 4xMSAA resolve
step. Impeller uses memory-less textures for those on iOS and the
playground is running on Mac. So it hasn’t bothered to create and reuse
textures in the playground runner. But, it should. And Xcode’s point that
texture allocations should not occur in a frame workload is well made.
Advice that is universally prudent when working on Impeller.
* The last insight is that the main render pass is empty. Well, no shit,
Sherlock. We won’t have this concern in a real application. The playground
will always render frames over and over specifically so that a frame
debugger can capture a frame. Even if nothing is in that frame. This won’t
be a problem in Flutter where no frame will be rendered if nothing
changes.
* Notice that we could immediately tell what the two textures that were
created late were for. This is because all GPU objects in Impeller have
the ability to be labelled. In fact most APIs in Impeller make it very
hard to create unlabelled objects. If you notice an object that is not
labelled, file a bug to label it. Better yet, find and label it yourself.
Building for easier instrumentation must be done diligently and
proactively. And it is your responsibility!
* Box `1` is the nav stack that you will use often and is unreasonably effective
in Xcode relative to other debuggers. It’s a good idea to remember its key
binding (mine is ctrl+cmd+arrow). If you click on something and find yourself
lost, go back to a known point (usually the summary).
* Box `3` highlights an `Export` button. This allows you to export a GPU trace.
But, realize that whoever views a GPU Trace needs to have identical hardware.
The traces are also rather large. So, in a single debugging session, you
should store these traces locally so you can check how your iterations are
affecting the frame. But you may not find sending these to others super
useful.
Before we trace anything more complicated, let’s take a look at the memory usage
in the playground.
# Memory Overview
Click on the `Show Memory` button in Box `6` from the previous section. We are
still not rendering anything in the playground.
An overview of all graphics memory usage is shown.
![alt_text](https://raw.githubusercontent.com/flutter/assets-for-api-docs//5da33067f5cfc7f177d9c460d618397aad9082ca/assets/engine/impeller/read_frame_captures/image3.avif "image_tooltip")
Along with all the objects that occupy memory, their locations in different
memory categorizations are also shown. Notice how the totals all add up to the
same number. This is useful in cases where you forgot to assign the optimum
memory storage mode for textures or buffers (private, managed, or memory-less).
You can double click an object to inspect it and highlighting a texture should
give you a preview of its contents.
![alt_text](https://raw.githubusercontent.com/flutter/assets-for-api-docs//5da33067f5cfc7f177d9c460d618397aad9082ca/assets/engine/impeller/read_frame_captures/image4.avif "image_tooltip")
Do not underestimate the usefulness of filtering the results either by category
name or resource name. You can filter by category by selecting the small
circular callstack button next to the category. When you apply filters, the
memory totals will update to reflect just filtered items. Here, there is 3 MB of
device memory for managed textures.
![alt_text](https://raw.githubusercontent.com/flutter/assets-for-api-docs//5da33067f5cfc7f177d9c460d618397aad9082ca/assets/engine/impeller/read_frame_captures/image5.avif "image_tooltip")
You can also apply freeform text filters to the resources using the text field
highlighted by the second box. This is used by multiple Impeller subsystems. For
example, offscreen textures that are used within a frame between multiple
render-passes are named such that they can be easily filtered. So, if you wanted
to estimate the memory overhead of such render-passes (say you are working on
optimizations to elide these), you can easily tell using a simple text filter.
This also highlights the importance of always naming all GPU resources. Again,
if you find an unnamed resource in this view, file a but to have it be tagged or
[tag it yourself](#finding-where-api-calls-were-made-in-the-codebase).
The “Time Since Last Used” is useful in catching potential memory leaks as
allocations not referenced for multiple frames must typically be collected to
save memory. Flutter applications typically have tons of these as its image
caches reference images that aren’t used for a while. If these are tagged
correctly (which they should be), they can be filtered away. That way, you can
focus on leaks in specific subsystems without having “cached” items confuse the
memory view.
# Driving on the Street
So we’re comfortable with the car in the parking lot and we know what all the
gauges and pedals do. Let’s drive this car onto a quiet street.
Let’s render a scene that actually renders something. But, let’s just render a
solid red triangle in the playground.
![alt_text](https://raw.githubusercontent.com/flutter/assets-for-api-docs//5da33067f5cfc7f177d9c460d618397aad9082ca/assets/engine/impeller/read_frame_captures/image6.avif "image_tooltip")
We notice two changes in the overview.
* When grouping the calls by the pipeline state, we see one pipeline listed with
one draw call. Since all GPU objects in Impeller are labelled, we see one
pipeline aptly called the `SolidFillPipeline` with one draw call.
* The `Performance` section in box `5` from the last section is no longer empty.
Let’s dive into each of the new sections.
## Inspecting the Pipeline State Object
All draw calls use a pipeline state object that specifies the programmable and
fixed function elements of the draw call as well as the data referenced by that
draw call.
The programmable elements of the pipeline state object are defined by shaders
that are written on the host and compiled into the engine in an intermediate
representation. Vertex shaders are run per vertex in the draw call and fragment
shaders run once per texture element in the coverage area of the draw call.
There are numerous fixed function elements in a pipeline state object. But the
major ones Impeller typically must configure are blend modes (i.e, how the new
texture element must combine with items already in the framebuffer), sample
counts for resolve (used in MSAA), pixel formats of its various attachments,
etc..
Pipeline state objects are immutable. So, if either the programmable or fixed
function element of the object needs to be modified, a new variant must be
created.
So, if you see multiple instances of a named pipeline in the grouping of calls
by pipeline state, realize that it is a different variant of a prototype
pipeline state. If these are not named appropriately and you can’t tell the
difference, file a bug to disambiguate them or [tag them
yourself](#finding-where-api-calls-were-made-in-the-codebase)!
Let’s click on the `SolidFill Pipeline` in the example to analyze that pipeline.
All draw calls listed below that pipeline use the same programmable and fixed
function pipeline configuration.
![alt_text](https://raw.githubusercontent.com/flutter/assets-for-api-docs//5da33067f5cfc7f177d9c460d618397aad9082ca/assets/engine/impeller/read_frame_captures/image7.avif "image_tooltip")
You will get intimately familiar with this view when you set up a new pipeline
state object in Impeller or try to reason about the correctness of one of the
pipeline state object variants.
In this example, we can tell that all draw calls with this pipeline state have
blending enabled with the given blend mode and work on images with `BGRA8Unorm`
pixel format. The draw call can also expect a stencil buffer.
Clicking on either the vertex or fragment shader should show the equivalent
Metal source code for the GLSL shader written in Impeller. This Metal source
code (and the shader debugger) is only available in debug and profile modes.
When GLSL shaders are written in Impeller, they are converted into intermediate
representation for packaging with the engine. However, since debugging shaders
is such a useful exercise, the shader compiler will also compile the GLSL
shaders into Metal source code and then package it with the debug or profile
engine alongside the intermediate representation that is actually used. That
way, the Xcode frame debugger can find that code when you ask to debug the
programmable elements of the pipeline.
We'll go into using the shader debugger later. But, now you know how to inspect
a pipeline.
## Inspecting a Single Draw Call
Each draw call must reference a pipeline state (that we already know how to
inspect) and provide references to the data used by that draw call (like vertex
and uniform buffers, attachments) along with metadata about it (like primitive
topology).
To inspect how each draw call is configured, select the call in the sidebar.
![alt_text](https://raw.githubusercontent.com/flutter/assets-for-api-docs//5da33067f5cfc7f177d9c460d618397aad9082ca/assets/engine/impeller/read_frame_captures/image8.avif "image_tooltip")
To get an overview of the draw call, the Bound Resources section is the most
useful view. Let’s ensure we understand each item.
The `Pipeline States` section we have already [covered in
detail](#inspecting-the-pipeline-state-object).
In the `Vertex` section, the `Geometry` lists how each vertex is transformed by
the vertex shader.
![alt_text](https://raw.githubusercontent.com/flutter/assets-for-api-docs//5da33067f5cfc7f177d9c460d618397aad9082ca/assets/engine/impeller/read_frame_captures/image9.avif "image_tooltip")
Here, you see how each vertex (three here since we are rendering a triangle) is
transformed by the shader such that it ends up in the correct spot in normalized
device coordinates. In this particular case, the solid color seems to be
presented to the vertex shader in a uniform with the shader passing it along to
the fragment stage as an output. An improvement could be to present the uniform
directly to the fragment stage. Impeller may have done this because only a
single uniform buffer for all stages was easier to set up.
You can double click on any buffer in the `Bound Resources` section to dump
information about that buffer presented in a view appropriate for that stage.
When I double click the buffer containing the uniform data, the following view
is shown.
![alt_text](https://raw.githubusercontent.com/flutter/assets-for-api-docs//5da33067f5cfc7f177d9c460d618397aad9082ca/assets/engine/impeller/read_frame_captures/image10.avif "image_tooltip")
Pay specific attention to the `Row` index. Impeller architecture doesn’t create
small individual buffers for uniform data. All uniform data for a single render
pass is packed into a single jumbo uniform buffer with each draw call
referencing its uniform data at an offset into this larger buffer. This allows
Impeller to avoid small allocations and use a simpler and faster bump allocator.
Here, it looks like the uniform data is towards the end of that jumbo buffer as
noted by the negative indices present in the view. The data at negative indices
is interpreted as garbage when viewed through the lens of the uniform data
layout the draw call expects.
The other useful item in the `Bound Resources` section is the state of the
attachments when the draw call was made. This comes in particularly handy for
debugging writes to a buffer that you will never actually see. For instance, the
stencil buffers.
To demonstrate debugging stencil buffers, I captured a trace of a Fuchsia
colored rectangle clipped to a circular shape. You’d never see the stencil
buffer so it would be hard to understand how the draw call is affecting it
without viewing the attachment in the frame debugger. Clicking on the gear to
the right of the buffer label also shows a histogram of the image as well as
options to change the color mapping, or, to view values within a certain range.
In this simple example, the values in the stencil buffer only range from 0 to 2.
So viewing the entire range of values in the stencil buffer would have made the
changes in the buffer indiscernible to you. Xcode helpfully selected the “Min to
Max” view for us. You can do the same for any attachment.
![alt_text](https://raw.githubusercontent.com/flutter/assets-for-api-docs//5da33067f5cfc7f177d9c460d618397aad9082ca/assets/engine/impeller/read_frame_captures/image11.avif "image_tooltip")
## Debugging a Shader
The shaders authored in Impeller use [GLSL
4.60](https://www.khronos.org/registry/OpenGL/specs/gl/GLSLangSpec.4.60.pdf).
Xcode does not support debugging these shaders natively. To work around this,
the Impeller shader compiler will convert those shaders to Metal source code and
embed them inside debug and profile mode engine binaries alongside the shaders
intermediate representation that is actually used to generate the pipeline state
objects. The Metal source code is converted such that it looks as similar to
GLSL as possible.
You can debug both vertex and fragment shaders. Remember that vertex shaders run
once per vertex (three times in the case of our example that renders a simple
triangle) and fragment shaders run once per texture element in the coverage area
of the draw call (potentially thousands of times depending on the side of the
triangle rendered). So, when you want to debug a shader, you must first find one
specific invocation of either the vertex or fragment shader to debug.
### iOS & macOS: Tell Xcode the Location of your Shaders
When using the Metal backend, instead of packaging shader sources as strings,
Impeller compiles and packages them into a single shader library. This library
is stripped of debugging information to minimize the size overhead. This
debugging information is not tossed away however. In the `out/<variant>/shaders`
directory, you will find a series of files with the `.metallibsym` extension.
When you try to debug a shader for the first time as described in the sections
below, Xcode you prompt you with a dialog that says it can't find the sources
for shader along with a button to show it where to find the relevant
`.metallibsym` files. Click that button and a dialog will pop up showing the
Metal libraries whose `.metallibsym` files could not be resolved.
![No Sources](https://raw.githubusercontent.com/flutter/assets-for-api-docs//5da33067f5cfc7f177d9c460d618397aad9082ca/assets/engine/impeller/read_frame_captures/no_sources.avif)
In the "External Source Search Paths" section, click the tiny `+` button at the
bottom. In the file chooser dialog box that appears next, select all the
`metallibsym` files in the `out/<variant>/shaders` directory.
You will only have to do this once per engine variant. The search paths will
remain the same as you rebuild the engine and the `.metallibsym` files contain
the UUID of the shader library. So Xcode won't attempt to resolve shaders
sources in an outdated `.metallibsym` file.
You may however also run into Xcode complaining about "Invalid UUID" errors.
This is instead of the "No Source" errors as shown in the dialog above.
![Invalid UUID](https://raw.githubusercontent.com/flutter/assets-for-api-docs//5da33067f5cfc7f177d9c460d618397aad9082ca/assets/engine/impeller/read_frame_captures/invalid_uuid.avif)
The team has been unable to to find documentation for this type of error. But
through trial-and-error, we have determined that the way to fix this is to set
the deployment target of the application to the current OS version during the
instrumentation run (either on macOS or iOS). To information about this line of
troubleshooting can be found [here](https://github.com/flutter/engine/pull/39532).
### Debugging a Fragment Shader
Since fragment shaders are run once per texture element in the coverage area of
the draw call, it is easiest to find invocations of the fragment shader by
opening one of the attachments used by the draw call.
Find and open either the color or stencil attachments in the `Bound Resources`
section as described in the section on [inspecting a single draw
call](#inspecting-a-single-draw-call).
![alt_text](https://raw.githubusercontent.com/flutter/assets-for-api-docs//5da33067f5cfc7f177d9c460d618397aad9082ca/assets/engine/impeller/read_frame_captures/image12.avif "image_tooltip")
At the bottom right corner of the attachment preview, you will see a disabled
`Debug` button with a crosshair to its right. The button is disabled because no
texture element is selected for debugging. Click on the crosshair and drag the
magnifier on a texture element converted by a draw call. The draw call will be
highlighted with a green outline.
Once a valid texture element is highlighted, the `Debug` button should be
enabled. Click it to debug that one invocation of the fragment shader used by
that draw call.
![alt_text](https://raw.githubusercontent.com/flutter/assets-for-api-docs//5da33067f5cfc7f177d9c460d618397aad9082ca/assets/engine/impeller/read_frame_captures/image13.avif "image_tooltip")
In the sidebar on the left, each step of execution of the fragment shader is
listed. You can click on each to move back and forth through the invocation. The
values of local variables will be updated as you do.
Some of the usual things to look out for when debugging fragment shaders:
* Pay attention to the input to the fragment stage from the vertex stage. This
is present in the argument marked with `[[stage_in]]`.
* The output of the stage (which defines the color of the texture element for
that invocation) is the return value of the invocation.
* If you aren’t sure of a particular operation within the shader, try adding
intermediate variables to the shader. The Impeller shader compiler will
faithfully add those intermediate for ease of debugging. Optimizations that
hinder debuggability are reserved for optimized release modes and occur on the
intermediate representation.
### Debugging a Vertex Shader
Since vertex shaders are run once per vertex in the draw call, it is easiest to
find an invocation of the vertex shader in the geometry viewer.
In the `Bound Resources` of a specific draw call, open the `Geometry` section as
described in the section on [inspecting a single draw
call](#inspecting-a-single-draw-call).
![alt_text](https://raw.githubusercontent.com/flutter/assets-for-api-docs//5da33067f5cfc7f177d9c460d618397aad9082ca/assets/engine/impeller/read_frame_captures/image14.avif "image_tooltip")
In this view, the `Debug` button on the bottom right will be disabled unless a
specific vertex in the geometry has been selected. Once you select the vertex
whose vertex shader invocation you want to debug, the button should be enabled.
Click it.
![alt_text](https://raw.githubusercontent.com/flutter/assets-for-api-docs//5da33067f5cfc7f177d9c460d618397aad9082ca/assets/engine/impeller/read_frame_captures/image15.avif "image_tooltip")
In the sidebar on the left, each step of execution of the vertex shader is
listed. You can click on each to move back and forth through the invocation. The
values of local variables will be updated as you do.
Some of the usual things to look out for when debugging vertex shaders:
* Pay attention to the input to the vertex stage invocation. This is present in
the argument marked with `[[stage_in]]`. This is the data you packed into the
vertex buffer for the draw call using an `impeller::VertexBufferBuilder`.
* The output of the stage (which defines vertex position in normalized device
coordinates) is the return value of the invocation.
* If you aren’t sure of a particular operation within the shader, try adding
intermediate variables to the shader. The Impeller shader compiler will
faithfully add those intermediate for ease of debugging. Optimizations that
hinder debuggability are reserved for optimized release modes and occur on the
intermediate representation.
### Live Shader Editing & Debugging
Often, it is useful to make minor edits to the shader to either visually see the
difference in the attachments or to see how local variables are affected.
When debugging an instrumentation of either the vertex or fragment shader, you
have the ability to edit the Metal source code. When you do, the `Reload Shader`
button at the bottom of the shader viewer that is typically disabled becomes
enabled.
![alt_text](https://raw.githubusercontent.com/flutter/assets-for-api-docs//5da33067f5cfc7f177d9c460d618397aad9082ca/assets/engine/impeller/read_frame_captures/image16.avif "image_tooltip")
Click on that button to see what that invocation would look like had it used the
updated shader. In the example above, I added an additional offset of 150 units
to the vertex position supplied to the vertex shader by the vertex buffer. When
I clicked on the `Reload Shaders` button, the location of the triangles in both
the color and stencil attachments was updated.
Unless you are only interested in inspecting local variables, it is often useful
to have the attachments viewer open side-by-side as you make live updates to the
shader.
No changes are being made to your GLSL shaders in Impeller. This is purely a
debugging aid and you must re-create those changes in GLSL to commit to those
updates.
# Finding Where API Calls Were Made in the Codebase
From either the frame insights or by selecting an API call on the object, open
the call-stack to navigate to the code that made that call. Then add your label.
![alt_text](https://raw.githubusercontent.com/flutter/assets-for-api-docs//5da33067f5cfc7f177d9c460d618397aad9082ca/assets/engine/impeller/read_frame_captures/image17.avif "image_tooltip")
When inspecting an API call, reveal the call-stack. This resource has already
been labelled and you’ll find the call in `AllocatorMTL::CreateTexture`.
![alt_text](https://raw.githubusercontent.com/flutter/assets-for-api-docs//5da33067f5cfc7f177d9c460d618397aad9082ca/assets/engine/impeller/read_frame_captures/image18.avif "image_tooltip")
This trace-first approach of navigating an unfamiliar codebase is unreasonably
effective.
# Next Steps & Further Reading
* Try repeating similar steps using a different profiler like RenderDoc or
Android GPU Inspector.
* [Watch] WWDC 2018: [Metal Shader Debugging &
Profiling](https://developer.apple.com/videos/play/wwdc2018/608/).
* [Watch] WWDC 2020: [Gain insights into your Metal app with
Xcode 12](https://developer.apple.com/videos/play/wwdc2020/10605).
* [Watch] WWDC 2020: [Optimize Metal apps and games with GPU
counters](https://developer.apple.com/videos/play/wwdc2020/10603).