Primiano Tucci | a662485 | 2020-05-21 19:12:50 +0100 | [diff] [blame] | 1 | # ProtoZero design document |
| 2 | |
| 3 | ProtoZero is a zero-copy zero-alloc zero-syscall protobuf serialization libary |
| 4 | purposefully built for Perfetto's tracing use cases. |
| 5 | |
| 6 | ## Motivations |
| 7 | |
| 8 | ProtoZero has been designed and optimized for proto serialization, which is used |
| 9 | by all Perfetto tracing paths. |
| 10 | Deserialization was introduced only at a later stage of the project and is |
| 11 | mainly used by offline tools |
| 12 | (e.g., [TraceProcessor](/docs/analysis/trace-processor.md). |
| 13 | The _zero-copy zero-alloc zero-syscall_ statement applies only to the |
| 14 | serialization code. |
| 15 | |
| 16 | Perfetto makes extensive use of protobuf in tracing fast-paths. Every trace |
| 17 | event in Perfetto is a proto |
| 18 | (see [TracePacket](/docs/reference/trace-packet-proto.autogen) reference). This |
| 19 | allows events to be strongly typed and makes it easier for the team to maintain |
| 20 | backwards compatibility using a language that is understood across the board. |
| 21 | |
| 22 | Tracing fast-paths need to have very little overhead, because instrumentation |
| 23 | points are sprinkled all over the codebase of projects like Android |
| 24 | and Chrome and are performance-critical. |
| 25 | |
| 26 | Overhead here is not just defined as CPU time (or instructions retired) it |
| 27 | takes to execute the instrumentation point. A big source of overhead in a |
| 28 | tracing system is represented by the working set of the instrumentation points, |
| 29 | specifically extra I-cache and D-cache misses which would slow down the |
| 30 | non-tracing code _after_ the tracing instrumentation point. |
| 31 | |
| 32 | The major design departures of ProtoZero from canonical C++ protobuf libraries |
| 33 | like [libprotobuf](https://github.com/google/protobuf) are: |
| 34 | |
| 35 | * Treating serialization and deserialization as different use-cases served by |
| 36 | different code. |
| 37 | |
| 38 | * Optimizing for binary size and working-set-size on the serialization paths. |
| 39 | |
| 40 | * Ignoring most of the error checking and long-tail features of protobuf |
| 41 | (repeated vs optional, full type checks). |
| 42 | |
| 43 | * ProtoZero is not designed as general-purpose protobuf de/serialization and is |
| 44 | heavily customized to maintain the tracing writing code minimal and allow the |
| 45 | compiler to see through the architectural layers. |
| 46 | |
| 47 | * Code generated by ProtoZero needs to be hermetic. When building the |
| 48 | amalgamated [Tracing SDK](/docs/instrumentation/tracing-sdk.md), the all |
| 49 | perfetto tracing sources need to not have any dependency on any other |
| 50 | libraries other than the C++ standard library and C library. |
| 51 | |
| 52 | ## Usage |
| 53 | |
| 54 | At the build-system level, ProtoZero is extremely similar to the conventional |
Andrew Shulaev | dad4850 | 2020-06-02 15:59:01 +0100 | [diff] [blame] | 55 | libprotobuf library. |
Primiano Tucci | a662485 | 2020-05-21 19:12:50 +0100 | [diff] [blame] | 56 | The ProtoZero `.proto -> .pbzero.{cc,h}` compiler is based on top of the |
| 57 | libprotobuf parser and compiler infrastructure. ProtoZero is as a `protoc` |
| 58 | compiler plugin. |
| 59 | |
| 60 | ProtoZero has a build-time-only dependency on libprotobuf (the plugin depends |
| 61 | on libprotobuf's parser and compiler). The `.pbzero.{cc,h}` code generated by |
| 62 | it, however, has no runtime dependency (not even header-only dependencies) on |
| 63 | libprotobuf. |
| 64 | |
| 65 | In order to generate ProtoZero stubs from proto you need to: |
| 66 | |
| 67 | 1. Build the ProtoZero compiler plugin, which lives in |
| 68 | [src/protozero/protoc_plugin/](/src/protozero/protoc_plugin/). |
| 69 | ```bash |
| 70 | tools/ninja -C out/default protozero_plugin protoc |
| 71 | ``` |
| 72 | |
| 73 | 2. Invoke the libprotobuf `protoc` compiler passing the `protozero_plugin`: |
| 74 | ```bash |
| 75 | out/default/protoc \ |
| 76 | --plugin=protoc-gen-plugin=out/default/protozero_plugin \ |
| 77 | --plugin_out=wrapper_namespace=pbzero:/tmp/ \ |
| 78 | test_msg.proto |
| 79 | ``` |
| 80 | This generates `/tmp/test_msg.pbzero.{cc,h}`. |
| 81 | |
| 82 | NOTE: The .cc file is always empty. ProtoZero-generated code is header only. |
| 83 | The .cc file is emitted only because some build systems' rules assume that |
| 84 | protobuf codegens generate both a .cc and a .h file. |
| 85 | |
| 86 | ## Proto serialization |
| 87 | |
| 88 | The quickest way to undestand ProtoZero design principles is to start from a |
| 89 | small example and compare the generated code between libprotobuf and ProtoZero. |
| 90 | |
| 91 | ```protobuf |
| 92 | syntax = "proto2"; |
| 93 | |
| 94 | message TestMsg { |
| 95 | optional string str_val = 1; |
| 96 | optional int32 int_val = 2; |
| 97 | repeated TestMsg nested = 3; |
| 98 | } |
| 99 | ``` |
| 100 | |
Andrew Shulaev | dad4850 | 2020-06-02 15:59:01 +0100 | [diff] [blame] | 101 | #### libprotobuf approach |
Primiano Tucci | a662485 | 2020-05-21 19:12:50 +0100 | [diff] [blame] | 102 | |
| 103 | The libprotobuf approach is to generate a C++ class that has one member for each |
| 104 | proto field, with dedicated serialization and de-serialization methods. |
| 105 | |
| 106 | ```bash |
| 107 | out/default/protoc --cpp_out=. test_msg.proto |
| 108 | ``` |
| 109 | |
| 110 | generates test_msg.pb.{cc,h}. With many degrees of simplification, it looks |
| 111 | as follows: |
| 112 | |
| 113 | ```c++ |
| 114 | // This class is generated by the standard protoc compiler in the .pb.h source. |
| 115 | class TestMsg : public protobuf::MessageLite { |
| 116 | private: |
| 117 | int32 int_val_; |
| 118 | ArenaStringPtr str_val_; |
| 119 | RepeatedPtrField<TestMsg> nested_; // Effectively a vector<TestMsg> |
| 120 | |
| 121 | public: |
| 122 | const std::string& str_val() const; |
| 123 | void set_str_val(const std::string& value); |
| 124 | |
| 125 | bool has_int_val() const; |
| 126 | int32_t int_val() const; |
| 127 | void set_int_val(int32_t value); |
| 128 | |
| 129 | ::TestMsg* add_nested(); |
| 130 | ::TestMsg* mutable_nested(int index); |
| 131 | const TestMsg& nested(int index); |
| 132 | |
| 133 | std::string SerializeAsString(); |
| 134 | bool ParseFromString(const std::string&); |
| 135 | } |
| 136 | ``` |
| 137 | |
| 138 | The main characteristic of these stubs are: |
| 139 | |
| 140 | * Code generated from .proto messages can be used in the codebase as general |
Andrew Shulaev | dad4850 | 2020-06-02 15:59:01 +0100 | [diff] [blame] | 141 | purpose objects, without ever using the `SerializeAs*()` or `ParseFrom*()` |
Primiano Tucci | a662485 | 2020-05-21 19:12:50 +0100 | [diff] [blame] | 142 | methods (although anecdotal evidence suggests that most project use these |
| 143 | proto-generated classes only at the de/serialization endpoints). |
| 144 | |
| 145 | * The end-to-end journey of serializing a proto involves two steps: |
| 146 | 1. Setting the individual int / string / vector fields of the generated class. |
| 147 | 2. Doing a serialization pass over these fields. |
| 148 | |
Andrew Shulaev | dad4850 | 2020-06-02 15:59:01 +0100 | [diff] [blame] | 149 | In turn this has side-effects on the code generated. STL copy/assignment |
Primiano Tucci | a662485 | 2020-05-21 19:12:50 +0100 | [diff] [blame] | 150 | operators for strings and vectors are non-trivial because, for instance, they |
| 151 | need to deal with dynamic memory resizing. |
| 152 | |
| 153 | #### ProtoZero approach |
| 154 | |
| 155 | ```c++ |
| 156 | // This class is generated by the ProtoZero plugin in the .pbzero.h source. |
| 157 | class TestMsg : public protozero::Message { |
| 158 | public: |
| 159 | void set_str_val(const std::string& value) { |
| 160 | AppendBytes(/*field_id=*/1, value.data(), value.size()); |
| 161 | } |
| 162 | void set_str_val(const char* data, size_t size) { |
| 163 | AppendBytes(/*field_id=*/1, data, size); |
| 164 | } |
| 165 | void set_int_val(int32_t value) { |
| 166 | AppendVarInt(/*field_id=*/2, value); |
| 167 | } |
| 168 | TestMsg* add_nested() { |
| 169 | return BeginNestedMessage<TestMsg>(/*field_id=*/3); |
| 170 | } |
| 171 | } |
| 172 | ``` |
| 173 | |
| 174 | The ProtoZero-generated stubs are append-only. As the `set_*`, `add_*` methods |
| 175 | are invoked, the passed arguments are directly serialized into the target |
| 176 | buffer. This introduces some limitations: |
| 177 | |
| 178 | * Readback is not possible: these classes cannot be used as C++ struct |
| 179 | replacements. |
| 180 | |
| 181 | * No error-checking is performed: nothing prevents a non-repeated field to be |
| 182 | emitted twice in the serialized proto if the caller accidentally calls a |
| 183 | `set_*()` method twice. Basic type checks are still performed at compile-time |
| 184 | though. |
| 185 | |
| 186 | * Nested fields must be filled in a stack fashion and cannot be written |
| 187 | interleaved. Once a nested message is started, its fields must be set before |
| 188 | going back setting the fields of the parent message. This turns out to not be |
| 189 | a problem for most tracing use-cases. |
| 190 | |
| 191 | This has a number of advantages: |
| 192 | |
| 193 | * The classes generated by ProtoZero don't add any extra state on top of the |
| 194 | base class they derive (`protozero::Message`). They define only inline |
| 195 | setter methods that call base-class serialization methods. Compilers can |
| 196 | see through all the inline expansions of these methods. |
| 197 | |
| 198 | * As a consequence of that, the binary cost of ProtoZero is independent of the |
| 199 | number of protobuf messages defined and their fields, and depends only on the |
| 200 | number of `set_*`/`add_*` calls. This (i.e. binary cost of non-used proto |
| 201 | messages and fields) anecdotally has been a big issue with libprotobuf. |
| 202 | |
| 203 | * The serialization methods don't involve any copy or dynamic allocation. The |
| 204 | inline expansion calls directly into the corresponding `AppendVarInt()` / |
| 205 | `AppendString()` methods of `protozero::Message`. |
| 206 | |
| 207 | * This allows to directly serialize trace events into the |
| 208 | [tracing shared memory buffers](/docs/concepts/buffers.md), even if they are |
| 209 | not contiguous. |
| 210 | |
| 211 | ### Scattered buffer writing |
| 212 | |
| 213 | A key part of the ProtoZero design is supporting direct serialization on |
| 214 | non-globally-contiguous sequences of contiguous memory regions. |
| 215 | |
| 216 | This happens by decoupling `protozero::Message`, the base class for all the |
| 217 | generated classes, from the `protozero::ScatteredStreamWriter`. |
| 218 | The problem it solves is the following: ProtoZero is based on direct |
| 219 | serialization into shared memory buffers chunks. These chunks are 4KB - 32KB in |
| 220 | most cases. At the same time, there is no limit in how much data the caller will |
| 221 | try to write into an individual message, a trace event can be up to 256 MiB big. |
| 222 | |
| 223 |  |
| 224 | |
| 225 | #### Fast-path |
| 226 | |
| 227 | At all times the underlying `ScatteredStreamWriter` knows what are the bounds |
| 228 | of the current buffer. All write operations are bound checked and hit a |
| 229 | slow-path when crossing the buffer boundary. |
| 230 | |
| 231 | Most write operations can be completed within the current buffer boundaries. |
| 232 | In that case, the cost of a `set_*` operation is in essence a `memcpy()` with |
| 233 | the extra overhead of var-int encoding for protobuf preambles and |
| 234 | length-delimited fields. |
| 235 | |
| 236 | #### Slow-path |
| 237 | |
| 238 | When crossing the boundary, the slow-path asks the |
| 239 | `ScatteredStreamWriter::Delegate` for a new buffer. The implementation of |
| 240 | `GetNewBuffer()` is up to the client. In tracing use-cases, that call will |
| 241 | acquire a new thread-local chunk from the tracing shared memory buffer. |
| 242 | |
| 243 | Other heap-based implementations are possible. For instance, the ProtoZero |
| 244 | sources provide a helper class `HeapBuffered<TestMsg>`, mainly used in tests (see |
| 245 | [scattered_heap_buffer.h](/include/perfetto/protozero/scattered_heap_buffer.h)), |
| 246 | which allocates a new heap buffer when crossing the boundaries of the current |
| 247 | one. |
| 248 | |
| 249 | Consider the following example: |
| 250 | |
| 251 | ```c++ |
| 252 | TestMsg outer_msg; |
| 253 | for (int i = 0; i < 1000; i++) { |
| 254 | TestMsg* nested = outer_msg.add_nested(); |
| 255 | nested->set_int_val(42); |
| 256 | } |
| 257 | ``` |
| 258 | |
| 259 | At some point one of the `set_int_val()` calls will hit the slow-path and |
| 260 | acquire a new buffer. The overall idea is having a serialization mechanism |
| 261 | that is extremely lightweight most of the times and that requires some extra |
| 262 | function calls when buffer boundary, so that their cost gets amortized across |
| 263 | all trace events. |
| 264 | |
| 265 | In the context of the overall Perfetto tracing use case, the slow-path involves |
| 266 | grabbing a process-local mutex and finding the next free chunk in the shared |
| 267 | memory buffer. Hence writes are lock-free as long as they happen within the |
| 268 | thread-local chunk and require a critical section to acquire a new chunk once |
| 269 | every 4KB-32KB (depending on the trace configuration). |
| 270 | |
| 271 | The assumption is that the likeliness that two threads will cross the chunk |
Deepanjan Roy | 1ff7fdb | 2020-06-24 15:19:18 -0400 | [diff] [blame] | 272 | boundary and call `GetNewBuffer()` at the same time is extremely low and hence |
Primiano Tucci | a662485 | 2020-05-21 19:12:50 +0100 | [diff] [blame] | 273 | the critical section is un-contended most of the times. |
| 274 | |
| 275 | ```mermaid |
| 276 | sequenceDiagram |
| 277 | participant C as Call site |
| 278 | participant M as Message |
| 279 | participant SSR as ScatteredStreamWriter |
| 280 | participant DEL as Buffer Delegate |
| 281 | C->>M: set_int_val(...) |
| 282 | activate C |
| 283 | M->>SSR: AppendVarInt(...) |
| 284 | deactivate C |
| 285 | Note over C,SSR: A typical write on the fast-path |
| 286 | |
| 287 | C->>M: set_str_val(...) |
| 288 | activate C |
| 289 | M->>SSR: AppendString(...) |
| 290 | SSR->>DEL: GetNewBuffer(...) |
| 291 | deactivate C |
| 292 | Note over C,DEL: A write on the slow-path when crossing 4KB - 32KB chunks. |
| 293 | ``` |
| 294 | |
| 295 | ### Deferred patching |
| 296 | |
| 297 | Nested messages in the protobuf binary encoding are prefixed with their |
| 298 | varint-encoded size. |
| 299 | |
| 300 | Consider the following: |
| 301 | |
| 302 | ```c++ |
| 303 | TestMsg* nested = outer_msg.add_nested(); |
| 304 | nested->set_int_val(42); |
| 305 | nested->set_str_val("foo"); |
| 306 | ``` |
| 307 | |
| 308 | The canonical encoding of this protobuf message, using libprotobuf, would be: |
| 309 | |
| 310 | ```bash |
| 311 | 1a 07 0a 03 66 6f 6f 10 2a |
| 312 | ^-+-^ ^-----+------^ ^-+-^ |
| 313 | | | | |
| 314 | | | +--> Field ID: 2 [int_val], value = 42. |
| 315 | | | |
| 316 | | +------> Field ID: 1 [str_val], len = 3, value = "foo" (66 6f 6f). |
| 317 | | |
Andrew Shulaev | dad4850 | 2020-06-02 15:59:01 +0100 | [diff] [blame] | 318 | +------> Field ID: 3 [nested], length: 7 # !!! |
Primiano Tucci | a662485 | 2020-05-21 19:12:50 +0100 | [diff] [blame] | 319 | ``` |
| 320 | |
| 321 | The second byte in this sequence (07) is problematic for direct encoding. At the |
| 322 | point where `outer_msg.add_nested()` is called, we can't possibly know upfront |
| 323 | what the overall size of the nested message will be (in this case, 5 + 2 = 7). |
| 324 | |
| 325 | The way we get around this in ProtoZero is by reserving four bytes for the |
| 326 | _size_ of each nested message and back-filling them once the message is |
| 327 | finalized (or when we try to set a field in one of the parent messages). |
| 328 | We do this by encoding the size of the message using redundant varint encoding, |
| 329 | in this case: `87 80 80 00` instead of `07`. |
| 330 | |
| 331 | At the C++ level, the `protozero::Message` class holds a pointer to its `size` |
| 332 | field, which typically points to the beginning of the message, where the four |
| 333 | bytes are reserved, and back-fills it in the `Message::Finalize()` pass. |
| 334 | |
| 335 | This works fine for cases where the entire message lies in one contiguous buffer |
| 336 | but opens a further challenge: a message can be several MBs big. Looking at this |
| 337 | from the overall tracing perspective, the shared memory buffer chunk that holds |
| 338 | the beginning of a message can be long gone (i.e. committed in the central |
| 339 | service buffer) by the time we get to the end. |
| 340 | |
| 341 | In order to support this use case, at the tracing code level (outside of |
| 342 | ProtoZero), when a message crosses the buffer boundary, its `size` field gets |
| 343 | redirected to a temporary patch buffer |
| 344 | (see [patch_list.h](/src/tracing/core/patch_list.h)). This patch buffer is then |
| 345 | sent out-of-band, piggybacking over the next commit IPC (see |
| 346 | [Tracing Protocol ABI](/docs/design-docs/api-and-abi.md#tracing-protocol-abi)) |
| 347 | |
| 348 | ### Performance characteristics |
| 349 | |
| 350 | NOTE: For the full code of the benchmark see |
| 351 | `/src/protozero/test/protozero_benchmark.cc` |
| 352 | |
| 353 | We consider two scenarios: writing a simple event and a nested event |
| 354 | |
| 355 | #### Simple event |
| 356 | |
| 357 | Consists of filling a flat proto message with of 4 integers (2 x 32-bit, |
| 358 | 2 x 64-bit) and a 32 bytes string, as follows: |
| 359 | |
| 360 | ```c++ |
| 361 | void FillMessage_Simple(T* msg) { |
| 362 | msg->set_field_int32(...); |
| 363 | msg->set_field_uint32(...); |
| 364 | msg->set_field_int64(...); |
| 365 | msg->set_field_uint64(...); |
| 366 | msg->set_field_string(...); |
| 367 | } |
| 368 | ``` |
| 369 | |
| 370 | #### Nested event |
| 371 | |
| 372 | Consists of filling a similar message which is recursively nested 3 levels deep: |
| 373 | |
| 374 | ```c++ |
| 375 | void FillMessage_Nested(T* msg, int depth = 0) { |
| 376 | FillMessage_Simple(msg); |
| 377 | if (depth < 3) { |
| 378 | auto* child = msg->add_field_nested(); |
| 379 | FillMessage_Nested(child, depth + 1); |
| 380 | } |
| 381 | } |
| 382 | ``` |
| 383 | |
| 384 | #### Comparison terms |
| 385 | |
| 386 | We compare, for the same message type, the performance of ProtoZero, |
| 387 | libprotobuf and a speed-of-light serializer. |
| 388 | |
| 389 | The speed-of-light serializer is a very simple C++ class that just appends |
| 390 | data into a linear buffer making all sorts of favourable assumptions. It does |
| 391 | not use any binary-stable encoding, it does not perform bound checking, |
| 392 | all writes are 64-bit aligned, it doesn't deal with any thread-safety. |
| 393 | |
| 394 | ```c++ |
| 395 | struct SOLMsg { |
| 396 | template <typename T> |
| 397 | void Append(T x) { |
| 398 | // The memcpy will be elided by the compiler, which will emit just a |
| 399 | // 64-bit aligned mov instruction. |
Primiano Tucci | 112f849 | 2020-09-14 21:31:54 +0200 | [diff] [blame] | 400 | memcpy(reinterpret_cast<void*>(ptr_), &x, sizeof(x)); |
Primiano Tucci | a662485 | 2020-05-21 19:12:50 +0100 | [diff] [blame] | 401 | ptr_ += sizeof(x); |
| 402 | } |
| 403 | |
| 404 | void set_field_int32(int32_t x) { Append(x); } |
| 405 | void set_field_uint32(uint32_t x) { Append(x); } |
| 406 | void set_field_int64(int64_t x) { Append(x); } |
| 407 | void set_field_uint64(uint64_t x) { Append(x); } |
| 408 | void set_field_string(const char* str) { ptr_ = strcpy(ptr_, str); } |
| 409 | |
Primiano Tucci | 112f849 | 2020-09-14 21:31:54 +0200 | [diff] [blame] | 410 | alignas(uint64_t) char storage_[sizeof(g_fake_input_simple) + 8]; |
Primiano Tucci | a662485 | 2020-05-21 19:12:50 +0100 | [diff] [blame] | 411 | char* ptr_ = &storage_[0]; |
| 412 | }; |
| 413 | ``` |
| 414 | |
| 415 | The speed-of-light serializer serves as a reference for _how fast a serializer |
| 416 | could be if argument marshalling and bound checking were zero cost._ |
| 417 | |
| 418 | #### Benchmark results |
| 419 | |
| 420 | ##### Google Pixel 3 - aarch64 |
| 421 | |
| 422 | ```bash |
| 423 | $ cat out/droid_arm64/args.gn |
| 424 | target_os = "android" |
| 425 | is_clang = true |
| 426 | is_debug = false |
| 427 | target_cpu = "arm64" |
| 428 | |
| 429 | $ ninja -C out/droid_arm64/ perfetto_benchmarks && \ |
| 430 | adb push --sync out/droid_arm64/perfetto_benchmarks /data/local/tmp/perfetto_benchmarks && \ |
| 431 | adb shell '/data/local/tmp/perfetto_benchmarks --benchmark_filter=BM_Proto*' |
| 432 | |
| 433 | ------------------------------------------------------------------------ |
| 434 | Benchmark Time CPU Iterations |
| 435 | ------------------------------------------------------------------------ |
| 436 | BM_Protozero_Simple_Libprotobuf 402 ns 398 ns 1732807 |
| 437 | BM_Protozero_Simple_Protozero 242 ns 239 ns 2929528 |
| 438 | BM_Protozero_Simple_SpeedOfLight 118 ns 117 ns 6101381 |
| 439 | BM_Protozero_Nested_Libprotobuf 1810 ns 1800 ns 390468 |
| 440 | BM_Protozero_Nested_Protozero 780 ns 773 ns 901369 |
| 441 | BM_Protozero_Nested_SpeedOfLight 138 ns 136 ns 5147958 |
| 442 | ``` |
| 443 | |
| 444 | ##### HP Z920 workstation (Intel Xeon E5-2690 v4) running Linux |
| 445 | |
| 446 | ```bash |
| 447 | |
| 448 | $ cat out/linux_clang_release/args.gn |
| 449 | is_clang = true |
| 450 | is_debug = false |
| 451 | |
| 452 | $ ninja -C out/linux_clang_release/ perfetto_benchmarks && \ |
| 453 | out/linux_clang_release/perfetto_benchmarks --benchmark_filter=BM_Proto* |
| 454 | |
| 455 | ------------------------------------------------------------------------ |
| 456 | Benchmark Time CPU Iterations |
| 457 | ------------------------------------------------------------------------ |
| 458 | BM_Protozero_Simple_Libprotobuf 428 ns 428 ns 1624801 |
| 459 | BM_Protozero_Simple_Protozero 261 ns 261 ns 2715544 |
| 460 | BM_Protozero_Simple_SpeedOfLight 111 ns 111 ns 6297387 |
| 461 | BM_Protozero_Nested_Libprotobuf 1625 ns 1625 ns 436411 |
| 462 | BM_Protozero_Nested_Protozero 843 ns 843 ns 849302 |
| 463 | BM_Protozero_Nested_SpeedOfLight 140 ns 140 ns 5012910 |
| 464 | ``` |