docs/design-docs/trace-buffer.md - third_party/perfetto - Git at Google

 # TraceBuffer V2 Design document

 ## Overview

 This document covers the design of TraceBufferV2, which is the 2025 rewrite
 of the core trace buffer code in occasion of ProtoVM.

 TraceBuffer is the non-shared userspace buffer that is used by the tracing
 service to hold traced data in memory, until it's either read back or written
 into a file. There is one TraceBuffer instance for each `buffers` section of the
 [trace config](/docs/concepts/config.md)

 ## Basic operating principles

 NOTE: This section assumes you are familiar with the core concepts exposed in
 [Buffers and dataflow](/docs/concepts/buffers.md).

 TraceBuffer is a _ring buffer on steroids_. Unfortunately due to the
 complications of the protocol (see [Challenges](#key-challenges) section) it is
 far from a plain byte-oriented FIFO ring buffer when it comes to readback and
 deletions.

 Before delving into its complications, let's explore its key operations.

 Logically TraceBuffer deals with overlapping streams of data, called
 _TraceWriter Sequences_, or in short just _Sequences_:

 * A client process that writes trace data acts as a _Producer_. Typically
   1 Producer = 1 Process, but there are cases where a process can host >1
   producers (e.g. if it uses N libraries each statically linking the tracing
   SDK)
 * A Producer declares DataSources, which is the unit of enablement/configuration
   in the buffer. DataSources, however, don't make it as an abstraction in the
   buffer. Only TracingService knows about data sources.
 * Each data source uses one or more TraceWriter, typically one per thread.
 * A TraceWriter writes linear sequences of TracePackets.

 From a TraceBuffer viewpoint, the only abstraction visible are the Producer
 (identified by a `uint16_t ProducerID`) and TraceWriter (identified by a
 `uint16_t WriterID`, within the scoped of a producer). The 32-bit tuple
 `{ProducerId, WriterID}` constitutes the unique Sequence ID for TraceBuffer.
 Everything in TraceBuffer is keyed by that.

 Basic operation:

 * Producers commit "chunks" into the SMB (Shared Memory Buffer).
 * A Chunk belongs to a `{ProducerID,WriterID}`, has a sequence ID and flags.
 * A SMB Chunk contains one or more fragments.
 * Typically 1 fragment == 1 packet, with the exception of the first and last
   fragment which MIGHT be continuations of longer fragmented packets. Note that
   a chunk could contain only one fragment that happens to be a continuation
   of a larger packet.
 * Chunks are copied almost as-is into the TraceBuffer, +- some metadata
   tracking (more details later).
 * At readback time, TraceBuffer reconstructs the sequence of packets and
   reassembles larger fragmented packets.
 * Reading is a destructive operation.
 * However the destructivity of read involves (almost) the same logic of readback
   to reconstruct packets being overwritten (and in future pass them to ProtoVM).

 Readback gives the following guarantees:

 * TraceBuffer only outputs fully formed packets which are valid
   protobuf-encoded TracePacket messages. Packets that are missing fragments, are
   missing patches or are invalid are dropped.
 * Data drops are always tracked and reported through the
   `TracePacket.previous_packet_dropped` flag.
 * TraceBuffer tries very hard to avoid _hiding_ valid data: a missing fragment
   or other similar protocol violations should not invalidate the rest of the
   data for the sequence.
 * Packets for a sequence are always read back FIFO in the same order of writing.
 * TraceBuffer tries hard to also respect FIFO-ness of packets belonging to
   different sequences (this is a new behaviour introduced by TraceBufferV2). So
   data is read back roughly in the same order it has been written (+- pending
   patches and data losses which can cause jumps).

 Readback happens in the following cases:

 * Read via IPC at the end of the trace: this is what perfetto_cmd does by
   default and is used in most tracing scenarios today. All the contents of the
   buffer are read after tracing stops.
 * Periodic reads into file: this happens in
   [long tracing mode][lt]. Every O(seconds)
   (configurable) the buffer is read and the packets extracted are written into
   the file descriptor passed by the consumer.
 * Periodics reads over IPC: these are rare. Some third-party tools like
   [GPU Inspector](https://gpuinspector.dev) do that. Architecturally they are
   no different from the case of read into file. TraceBuffer isn't aware of any
   difference between "reading into a file" or "reading via IPC". Those concepts
   exist only in TracingServiceImpl.

 Code-wise there are four main entry-points:

 Writer-side (Producer-side):

 * `CopyChunkUntrusted()`: called when a CommitData IPC is received, or when the
   service performs SMB Scraping (more below)
 * `TryPatchChunkContents()`: still part of CommitData IPC.

 Reader-side:

 * `BeginRead()`: called at readback time at the beginning of each read batch.
 * `ReadNextTracePacket()`: called once for each packet until either there are
   no more packets in the buffer or TracingServiceImpl decided it has read
   enough data for the current task (to avoid saturating the IPC channel).

 ## Key challenges

 ### RING_BUFFER vs DISCARD

 TraceBuffer can operate in two modes.

 #### RING_BUFFER

 This is the mode used by the majority of the traces. It is also the one with
 the most complications. This document focuses on the operation in RING_BUFFER
 mode, unless otherwise specified.
 This mode can be used for pure ring buffer tracing, or can be coupled with
 `write_into_file` to have [long traces][lt]
 streamed into disk, in which case the
 ring buffer serves mainly to decouple the SMB and the I/O activity (and to handle
 fragment reassembly).

 [lt]: /docs/concepts/config.md#long-traces

 #### DISCARD

 This mode is used for one-shot traces where the user cares about the left-most
 part of the trace. This is conceptually easier: once reached the end of the
 buffer, TraceBuffer stops accepting data.

 There is a slight behavioural change from the V1 implementation. V1 tried
 to be (too) smart about DISCARD and allowed to keep writing data into the buffer
 as long as the write and read cursors never crossed (i.e. as long as the reader
 caught up).
 This turned out to be useless and confusing: coupling `DISCARD` with
 `write_into_file` leads to a scenario where DISCARD behaves almost like
 a RING_BUFFER. However if the reader doesn't catch up (e.g. due to lack of CPU
 bandwidth), TraceBuffer stops accepting data (forever).
 We later realized this was a confusing feature (a ring buffer that suddenly
 stops) and added warnings when trying to combine the two.

 V2 doesn't try to be smart about readbacks and simply stops once we reach the
 end of the buffer, whether it has been read or not.

 ### Fragmentation

 Packet fragmentation is the cause of most of TraceBuffer's design complexity.

 ```
 Simple Fragmentation Example:
 Chunk A (ChunkID=100)      Chunk B (ChunkID=101)      Chunk C (ChunkID=102)
 ┌─────────────────────┐    ┌─────────────────────┐    ┌─────────────────────┐
 │[Packet1: Complete]  │    │[Packet2: Begin]     │    │[Packet2: Continue]  │
 │[Packet2: Begin]     │    │ flags: kContOnNext  │    │ flags: kContFromPrev│
 │ flags: kContOnNext  │    └─────────────────────┘    │[Packet2: End]       │
 └─────────────────────┘                               │[Packet3: Complete]  │
                                                       └─────────────────────┘

 Fragmentation Chain: Packet2 = [Begin] → [Continue] → [End]
 ```

 **Key Fragmentation Challenges**:

 * **Out-of-order commits**: Chunks may arrive out of ChunkID order due to SMB scraping
 * **Missing fragments**: Gaps in ChunkID sequence cause packet drops
 * **Patch dependencies**: Chunks marked `kChunkNeedsPatching` block readback until patched
 * **Buffer wraparound**: Fragmented packets may span buffer wraparound boundaries

 ### Out-of-order commits

 Out of order commits are rare but regularly present. They happen due to a
 feature called _SMB Scraping_ introduced in the early years of Perfetto.

 SMB scraping happens when TracingServiceImpl, upon a _Flush_, forcefully reads
 the chunks in the SMB, even if they are not marked as completed, and writes them
 into the trace buffer.

 This was necessary to deal with data sources like TrackEvent that can be used on
 arbitrary threads that don't have a TaskRunner, where would it be impossible to
 issue a PostTask(FlushTask) upon a trace protocol Flush request.

 The challenge is that TracingServiceImpl, when scraping, scans the SMB in linear
 order and commits chunks as found. But that linear order translates into
 "chunk allocation order", which is unpredictable, effectively causing chunks
 to be committed in random order.

 In practice, these instances are relatively rare, as they happen:

 * Only when stopping the trace, for most traces.
 * every O(seconds) in the case of [long tracing mode][lt]. Hence they must
   be supported, but not optimized for.

 Important note: TraceBuffer assumes that all out-of-order commits are batched
 together atomically. The only known use case for OOO is SMB scraping, which
 commits all scraped chunks in one go within a single TaskRunner task.

 Hence we assume that the following cannot happen:

 * Task 1 (IPC message)
   * Commit chunk 1
   * Commit chunk 3
 * Task 2
   * ReadBuffers (e.g. due to periodic write_into_file)
 * Task 3 (IPC message)
   * Commit chunk 2

 The logic on TraceBufferV2 treats any ChunkID gaps identified, after having
 sorted chunks by ChunkID, as data losses.

 ### Tracking of data losses

 There are several paths that can lead to a data loss, and TraceBuffer must track
 and report all of them. Debugging data losses is a very common activity. It's
 extremely important that TraceBuffer signals any case of data loss.

 There are several different types and causes of data losses:

 * SMB exhaustion: this happens when a TraceWriter drops data because the SMB
   is full. TraceWriters signal this by appending a special fragment of size
   `kPacketSizeDropPacket` at the end of the next chunk.
 * Fragment reassembly failure: this happens when TraceBuffer tries to reassemble
   a fragmented packet and realizes there is a gap in the sequence of ChunkIDs
   (typically due to a chunk being overwritten in ring-buffer mode).
 * Sequence gaps: this happens when there is no fragmentation but two chunks have
   a discontinuity in their ChunkID(s). This happens due to ring-buffer
   overwrites or due to some other issue when writing in the SMB.
 * ABI violations: this happens when a Chunk is malformed, for instance:
   * One of its fragments has a size that goes out of bounds.
   * The first fragment has a "fragment continuation" flag, but there was no
     fragment previously initiated.

 ### Patches

 When a packet spans across several fragments, almost always it involves patching
 due to the nature of protobuf encoding. The problem is the following:

 * A TraceWriter starts writing a packet at the end of a chunk.
 * By doing so it starts writing a protobuf message (at very least for the root
   TracePacket proto). More protobuf messages might be nested and open while
   writing (e.g., writing a TracePacket.ftrace_events bundle)
 * The TraceWriter runs out of space in the Chunk. So it commits the current
   chunk in the SMB and acquires a new one to continue writing.
 * The chunk being committed contains the preamble with the message size. However
   that preamble at the moment is filled with zeros, because we don't know yet
   the size of the message(s), as they are still being written.
 * Only when the nested messages end TraceWriter can possibly know the size of
   the messages to put in the preamble. But at this point the chunk containing
   the preamble has been committed in the SMB. TraceWriters cannot touch
   committed chunks. More importantly, they might have been already consumed by
   the TracingService.
 * To deal with this, the IPC protocol exposes the ability to patch a Chunk via
   IPC, with the semantics: _If you (TracingService/TraceBuffer) still have
   ChunkID 1234_ for my `{ProducerID,WriterID}` patch offset X with contents
   `[DE,AD, BE,EF]`.

 From a protocol viewpoint, only the last fragment of a chunk can be patched:

 * Non-fragmented chunks don't require any patches.
 * The first fragment of a chunk (which is the last of the chain) by design
   does not need any patching as there is no further fragment.
 * Note that a chunk can contain a single fragment which is both the first and
   last, in the middle of a fragmentation chain. This can also need patching.
 * In general given a packet fragmented in N fragments, all but the last
   fragment can (and generally will) need patching.

 The information about "needs patching" is held in the SMB's Chunk flags
 (`kChunkNeedsPatching`).

 The `kChunkNeedsPatching` state is cleared by the `CommitData` IPC, which
 contains, alongside the patches offset and payload, the `bool has_more_patches`
 flag. When false, it causes the `kChunkNeedsPatching` state to be cleared.

 From a TraceBuffer viewpoint patching has the following implications:

 * Chunks that are pending patches cause a stalling of the readback, for the
   sequence.
 * Stalling is not synchronous. TraceBuffer simply acts as if there was no more
   data for the sequence, either by moving on to other sequences or by returning
   false in ReadNextTracePacket() when all the other sequences have been read.
 * Fragment reassembly stops gracefully in presence of a chunk that
   has patches pending, without destroy any data or signalling a data loss.
 * Because patches travel over IPC, and the IPC channel is by design non-lossy
   we stall the sequence for arbitrary time in presence of missing patches.
 * The stalling, however, cannot affect other sequences.
 * So a fragmented packet missing patches can cause a long chain of packets (for
   the same TraceWriter sequences) to never be propagated in output when reading
   back, but cannot stall other sequences.
 * However, if the chunks that are pending patches get overwritten by newer data,
   the stalling ends, and TraceBuffer shall keep reading next packets, signalling
   the data loss.

 ### Recommits

 Recommit means committing again a chunk that exists in the buffer with the same
 ChunkID.
 The only legitimate case of recommit is SMB scraping followed by an actual
 commit. We don't expect nor support the case of Producers trying to re-commit
 the same chunk N times, as that unavoidably leads to undefined behaviour (what
 if the tracing service has written the packets to a file already?).

 This is the scenario when recommit can legitimately happen:

 * SMB scraping happens, and TracingService calls CopyChunkUntrusted for a chunk
   that is still being written by the producer. While doing this it signals the
   condition to TraceBuffer passing the `chunk_complete=false` argument.
 * The chunk is copied into the TraceBuffer. By design when committing a scraped
   (incomplete) chunk TraceBuffer ignores the last fragment, because it cannot
   tell if the producer is still writing it or not.
 * Later on the TraceWriter (who is unaware of SMB scraping) finishes writing the
   chunk and commits it.
 * TraceBuffer at this point overwrites the chunk, potentially extending it with
   the last fragment.

 NOTE: kChunkNeedsPatching and kIncomplete are two different and orthogonal
 chunk states. kIncomplete has nothing to do with fragments and is purely about
 SMB scraping (and the fact that we have to be conservative and ignore the last
 fragment).

 Implications:

 * An incomplete chunk causes a read stall for the sequence similar to what
   kChunkNeedsPatching does.
 * Similarly to the former case, the stall is withdrawn if the chunk gets
   overwritten.
 * TraceBuffer never tries to read the last fragment of an incomplete chunk.
 * As such an incomplete chunk cannot be fragmented on the ending side (phew).

 The main complication of incomplete chunks is that we cannot know upfront the
 size of their payload. Because of this we have to conservatively copy and
 reserve in the buffer the whole chunk size.

 ### Buffer cloning

 Buffer cloning happens via the CloneReadOnly() method. As the name suggests
 it creates a new TraceBuffer instance which contains the same contents but can
 only be read into. This is to support `CLONE_SNAPSHOT` triggers.

 Architecturally buffer cloning is not particularly complicated, at least in
 the current design. The main design implications are:

 * Ensuring that no state in the TraceBuffer fields contains pointers.
 * For this reason, the core structure in the buffer use offsets rather than
   pointers (which also happen to be more memory compact and cache-friendly).
 * Stats and auxiliary metadata tends to be the thing that requires some care,
   where bugs can occasionally hide.

 ### ProtoVM

 ProtoVM is an upcoming feature to TracingService. It's a non-turing-complete
 VM language to describe proto merging operations, to keep track of the state
 of arbitrary proto-encoded data structures as we overwrite data in the
 trace buffer.
 ProtoVM has been the reason that triggered the redesign of TraceBuffer V2.

 Without getting into its details, the primary requirement of ProtoVM is the
 following: when overwriting chunks in the trace buffer, we must pass valid
 packets from these soon-to-be-deleted chunks to ProtoVM. We must do so in order,
 hence replicating the same logic that we would use when doing a readback.

 Internal docs about ProtoVM:

 * [go/perfetto-proto-vm](http://go/perfetto-proto-vm)
 * [go/perfetto-protovm-implementation](http://go/perfetto-protovm-implementation)

 ### Overwrites

 For the aforementioned ProtoVM reasons, in the V2 design, the logic that deals
 with ring buffer overwrites (`DeleteNextChunksFor()`) is almost identical - and
 shares most of its code with - the readback logic.

 I say almost because there is (of course) a subtle difference: when deleting
 chunks, stalling (either due to pending patches or incompleteness) is NOT an
 option. Old chunks MUST go to make space to new chunks, no matter what.

 So overwrites are the equivalent of a no-stalling force-delete readback.

 ## Core design

 There are two main data structures involved:

 #### `TBChunk`

 ![tbchunk](/docs/images/tracebuffer-design/tbchunk.drawio.svg)

 Is the struct, stored in the trace buffer memory as a result of calling
 CopyChunkUntrusted from a SMB chunk.

 A TBChunk is very similar to a SMB chunk with the following caveats:

 * The sizeof() of both is the same (16 bytes). This is very important to keep
   patches offsets consistent.

 * The SMB chunk maintains a counter of fragments. TBChunk instead does
   byte-based bookkeeping, as that reduces the complexity of the read iterators.

 * The layout of the fields is slightly different, but they both contains
   ProducerID, WriterID, ChunkID, fragment counts/sizes and flags.
   The SMB chunk layout is an ABI. The TCHunk layout is not: it is an
   implementation detail and can change.

 * TBChunk maintains a basic checksum for each chunk (used only in debug builds).

 In a nutshell TBChunk is:

 * A linear buffer of `base::PagedMemory` contains a sequence of chunks.
 * Each chunk is prefixed by a `struct TBChunk` header followed by its fragments'
   payload.
 * The TBChunk header contains also
   * The read state (how many bytes of fragments have been consumed)
   * ABI Flags
     * kFirstPacketContinuesFromPrevChunk
     * kLastPacketContinuesOnNextChunk
     * kChunkNeedsPatching
   * Local flags
     * kChunkIncomplete (for SMB-scraped chunks)

 ### SequenceState

 It maintains the state of a `{ProducerID, WriterID}` sequence.

 Its important feature is maintaining an ordered list (logically) of TBChunk(s)
 for that sequence, sorted by ChunkID order.
 The "list" is actually a CircularQueue of offsets, which has O(1)
 `push_back()` and `pop_front()` operations.

 * TraceBuffer holds a hashmap of `ProducerAndWriterId` -> `SequenceState`.
 * There is one `SequenceState` for each {Producer,Writer} active in the buffer.
 * `SequenceState` holds:
   * The identity of the producer (uid, pid, ...)
   * The `last_chunk_id_consumed`, to detect gaps in the ChunkID sequence
     (data losses)
   * A sorted list (a `CircularQueue<size_t>`) of chunks, which stores their
     offset in the buffer.
 * The `chunks` queue is maintained sorted and updated as chunks are appended and
   consumed (removed) from the buffer.

 The lifetime of a `SequenceState` has a subtle tradeoff:

 * On one hand, we could destroy a SequenceState when the last chunk for a
   sequence has been read or overwritten.
 * After all we must delete SequenceState(s) at some point. Doing otherwise would
   cause memory leaks in long running traces if we have many threads coming and
   going, as up to 64K sequences per producer are possible.
 * On the other hand, deleting sequences too aggressively have a drawback: we
   cannot detect data losses in long-trace mode
   (see [Issue #114](https://github.com/google/perfetto/issues/114) and
   [b/268257546](http://b/268257546)). [Long trace mode][lt] periodically
   consumes the buffer, hence making all sequences eligible to be destroyed if we
   were to be aggressive.
 * The problem here lies in the fact that `SequenceState` holds the
   `last_chunk_id_consumed` which is used to detect gaps in the chunk ids .

 TraceBufferV2 balances this using a lazy sweeping approach: it allows the most
 recently deleted `SequenceState`s to stay alive, up to
 `kKeepLastEmptySeq = 1024`. See `DeleteStaleEmptySequences()`.

 ### FragIterator

 A simple class that tokenizes fragments in a chunk and allows forward-only
 iteration.

 It deals with untrusted data, detecting malformed / out of bounds scenarios.

 It does not alter the state of the buffer.

 ### ChunkSeqIterator

 A simple utility class that iterates over the ordered list of TBChunk for
 a given SequenceState. It merely follows the SequenceState.chunks queue
 and detects gaps.

 ### ChunkSeqReader

 Encapsulates most of the readback complexity. It reads and consumes chunks
 in sequence order, as follows:

 * When constructed, the caller must pass a target TBChunk as argument. This is
   the chunk where we will stop the iteration *.
 * At readback time this is the next chunk in the buffer that we want to read.
 * At overwrite time this is the chunk that we are about to overwriter.
 * In both cases, because of OOO commits, the next chunk in buffer-order might
   not necessarily be the next chunk that should be consumed in FIFO order
   (although in the vast majority cases we expect them to be in order).
 * Upon construction, it rewinds all the way back in the `SequenceState.chunks`
   (using `ChunkSeqIterator`) and starts the iteration from there.
 * It keeps reading packets until we reach the target TBChunk passed in the
   constructor.
 * In some cases (fragmentation) it might read beyond the target chunk. This is
   to reassemble a packet that started in the target chunk and continued later
   on.
 * When doing so it just consumes the fragment required for reassembly and leaves
   the other packets in the chunk untouched, to preserve global FIFO-ness.

 ### Buffer order vs Sequence order

 Chunks can be visited in two different ways:

 1. Buffer order: in the order they have been written in the buffer.
    In the example below: A1, B1, B3, A2, B2

 2. In Sequence order: in the order they appear in the SequenceState's list.

 ![core design](/docs/images/tracebuffer-design/core-design.drawio.svg)

 ### Writing chunks

 When chunks are written via `CopyChunkUntrusted()` a new `TBChunk` is
 allocated in the buffer's PagedMemory using the usual bump-pointer pattern
 you'd expect from a ring-buffer. Chunks are variable-size, and are stored
 contiguously with 32-bit alignment.

 The offset of the chunk is also appended in the `SequenceState.chunks` list.

 After the first wrapping, writing a chunk involves deleting one or more
 existing chunks. The deletion operation `RemoveNextChunksFor()` is as complex
 as a readback, because it reconstructs packets being deleted in order, to pass
 them to ProtoVM.

 So the writing itself is straightforward, but the deletion (overwrite) of
 existing chunks is where most of the complexity lies. This is described in
 the next section.

 #### DeleteNextChunksFor() Flow

 ```mermaid
 flowchart TD
     A[DeleteNextChunksFor<br/>bytes_to_clear] --> B[Initialize: off = wr_<br/>clear_end = wr_ + bytes_to_clear]

     B --> C{off < clear_end?}
     C -->|No| M[Create padding chunks<br/>for partial deletions]

     C -->|Yes| D{off >= used_size_?}
     D -->|Yes| N[Break - nothing to delete<br/>in unused space]

     D -->|No| E[chunk = GetTBChunkAt off]
     E --> F{chunk.is_padding?}
     F -->|Yes| G[Update padding stats<br/>off += chunk.outer_size]
     F -->|No| H[Create ChunkSeqReader<br/>in kEraseMode]

     H --> I[ReadNextPacketInSeqOrder loop]
     I --> J{Packet found?}
     J -->|Yes| K[Pass packet to ProtoVM<br/>has_cleared_fragments = true]
     J -->|No| L{has_cleared_fragments?}

     K --> I
     L -->|Yes| O[Mark sequence data_loss = true]
     L -->|No| P[No data loss]

     O --> Q[Update overwrite stats<br/>off += chunk.outer_size]
     P --> Q

     Q --> R{More chunks in range?}
     R -->|Yes| C
     R -->|No| M

     G --> C

     M --> S[Scan remaining range for padding]
     S --> T{Partial chunk at end?}
     T -->|Yes| U[Create new padding chunk<br/>for remaining space]
     T -->|No| V[End]

     U --> V
     N --> V

     style A fill:#e1f5fe
     style K fill:#fff3e0
     style O fill:#ffcdd2
     style V fill:#c8e6c9
 ```

 **Key Differences from ReadNextTracePacket:**

 * **No stalling**: Chunks marked as incomplete or needing patches are
   force-deleted
 * **ProtoVM integration**: Valid packets are reconstructed and passed to ProtoVM
   before deletion
 * **Padding management**: Creates padding chunks for partial deletions at range
   boundaries
 * **Stats tracking**: Updates overwrite statistics rather than read statistics

 ### Reading back packets

 Readback (`ReadNextTracePacket()`) is where most of the TraceBuffer's complexity
 lies, as it needs to reassembles packets from fragments, deal with gaps / data
 losses, and deal with interleaving of chunks from different sequences, and out
 of ordering.

 ```mermaid
 flowchart TD
     A[ReadNextTracePacket Start] --> B{chunk_seq_reader_ exists?}

     B -->|No| C[Get chunk at rd_]
     C --> D{Is chunk padding?}
     D -->|Yes| E[rd_ += chunk.outer_size<br/>Check wrap around]
     D -->|No| F[Create ChunkSeqReader<br/>for this chunk]

     B -->|Yes| G[ReadNextPacketInSeqOrder]
     F --> G

     G --> H{Packet found?}
     H -->|Yes| I[Set sequence properties<br/>Set data_loss flag<br/>Return packet]
     H -->|No| J[Get end chunk from reader<br/>rd_ = end_offset + size]

     J --> K{rd_ == wr_ OR<br/>wrapped to wr_?}
     K -->|Yes| L[Return false - no more data]
     K -->|No| M[Reset chunk_seq_reader_<br/>Handle wrap around]

     E --> N{rd_ wrapped around?}
     N -->|Yes| O[rd_ = 0]
     N -->|No| P[Continue with new rd_]

     O --> K
     P --> K
     M --> B

     style A fill:#e1f5fe
     style I fill:#c8e6c9
     style L fill:#ffcdd2
 ```

 #### ChunkSeqReader Internal Flow

 This is how ReadNextTracePacket() works:

 * We start the read iteration immediately after the write cursor. Because writes
   are simply FIFO, the oldest data in the buffer is by design the one after the
   write cursor.
 * For simplicity let's ignore fragmentation for now and assume that every chunk
   is self-contained (i.e. every chunk contains N fragments = N packets).
 * If we assume no fragmentation, and if we also assume no out-of-order commits
   (i.e. no scraping) we could just iterate linearly in buffer order, and visit
   the chunks until we reach back the write cursor.
 * So we could just tokenize packets out of each chunk, and return one for each
   `ReadNextTracePacket()` invocation. Done.


 ```mermaid
 flowchart TD
     A[ReadNextPacketInSeqOrder] --> B{skip_in_generation?}
     B -->|Yes| C[Return false - stalled]

     B -->|No| D[NextFragmentInChunk]
     D --> E{Fragment found?}
     E -->|Yes| F{Fragment type?}

     F -->|kFragWholePacket| G[ConsumeFragment<br/>Return packet]
     F -->|kFragBegin| H[ReassembleFragmentedPacket]
     F -->|kFragEnd/Continue| I[Data loss - unexpected<br/>ConsumeFragment<br/>Continue loop]

     E -->|No| J{Chunk corrupted?}
     J -->|Yes| K[Mark data_loss = true]

     J -->|No| L{Chunk incomplete?}
     L -->|Yes| M[Set skip_in_generation<br/>Return false]
     L -->|No| N[EraseCurrentChunk]

     K --> N
     N --> O{Reached end chunk?}
     O -->|Yes| P[Return false]
     O -->|No| Q[NextChunkInSequence]

     Q --> R{Next chunk exists?}
     R -->|No| P
     R -->|Yes| S[iter_ = next_chunk<br/>Create new FragIterator]

     S --> D

     H --> T{Reassembly result?}
     T -->|Success| U[Return reassembled packet]
     T -->|NotEnoughData| V[Set skip_in_generation<br/>Return false]
     T -->|DataLoss| W[Mark data_loss = true<br/>Continue loop]

     style A fill:#e1f5fe
     style G fill:#c8e6c9
     style U fill:#c8e6c9
     style C fill:#ffcdd2
     style P fill:#ffcdd2
     style V fill:#fff3e0
 ```

 #### Dealing with out-of-order chunks

 But things are more complicated. Let's take first only out-of-ordering into the
 picture. With reference to the drawing above, let's imagine the write cursor is
 @ offset=48, right before B3.

 If we proceeded simply in buffer order we would break FIFO-ness, as we would
 first emit the packets contained in B3, then A2 (this is fine) and ultimately
 B2 (this is problematic).

 The only valid linearizations that preserve in-sequence FIFO-ness, would be
 either [A2,B2,B3], [B2,B3,A2] or [B2,A2,B3].

 In order to deal with this we introduce a two layer walk in the readback code:

 * The outer layer iterates in buffer order, as that respects the global
   FIFO-ness, trying to get events out in roughly the same order they got in
   (% chunking)
 * At every step, the inner layer proceeds in sequence order, as follows:
   * It takes the next chunk (B3 in the example above) that buffer-order visit
     found.
   * It finds its SequenceState by doing a hash-lookup in the `sequences_` map.
   * It jumps to the first Chunk in the `SequenceState.chunks` ordered list.
   * It proceeds in sequence order until the target chunk (B3) has been reached.
 * The outer layer continues in buffer order and the story repeats.

 In the code, the outer layer walk is implemented by
 `TraceBufferV2::ReadNextTracePacket()` while the inner walk is implemented by
 the `class ChunkSeqReader::ReadNextPacket()`.

 ## Benchmarks

 ### Apple Macbook (M4)

 ```txt
 BM_TraceBuffer_WR_SingleWriter<TraceBufferV1>       bytes_per_second=9.77742G/s
 BM_TraceBuffer_WR_SingleWriter<TraceBufferV2>       bytes_per_second=12.6395G/s
 BM_TraceBuffer_WR_MultipleWriters<TraceBufferV1>    bytes_per_second=8.65385G/s
 BM_TraceBuffer_WR_MultipleWriters<TraceBufferV2>    bytes_per_second=11.7582G/s
 BM_TraceBuffer_RD_MixedPackets<TraceBufferV1>      bytes_per_second=4.27694G/s
 BM_TraceBuffer_RD_MixedPackets<TraceBufferV2>      bytes_per_second=4.35475G/s
 ```

 ### Google Pixel 7

 ```txt
 BM_TraceBuffer_WR_SingleWriter<TraceBufferV1>      bytes_per_second=4.4379G/s
 BM_TraceBuffer_WR_SingleWriter<TraceBufferV2>      bytes_per_second=3.7931G/s
 BM_TraceBuffer_WR_MultipleWriters<TraceBufferV1>   bytes_per_second=3.19148G/s
 BM_TraceBuffer_WR_MultipleWriters<TraceBufferV2>   bytes_per_second=3.47354G/s
 BM_TraceBuffer_RD_MixedPackets<TraceBufferV1>      bytes_per_second=1.26698G/s
 BM_TraceBuffer_RD_MixedPackets<TraceBufferV2>      bytes_per_second=1.35394G/s
 ```