|  | <!--* | 
|  | # Document freshness: For more information, see go/fresh-source. | 
|  | freshness: { owner: 'haberman' reviewed: '2023-02-24' } | 
|  | *--> | 
|  |  | 
|  | # upb vs. C++ Protobuf Design | 
|  |  | 
|  | [upb](https://github.com/protocolbuffers/protobuf/tree/main/upb) is a small C | 
|  | protobuf library. While some of the design follows in the footsteps of the C++ | 
|  | Protobuf Library, upb departs from C++'s design in several key ways.  This | 
|  | document compares and contrasts the two libraries on several design points. | 
|  |  | 
|  | ## Design Goals | 
|  |  | 
|  | Before we begin, it is worth calling out that upb and C++ have different design | 
|  | goals, and this motivates some of the differences we will see. | 
|  |  | 
|  | C++ protobuf is a user-level library: it is designed to be used directly by C++ | 
|  | applications.  These applications will expect a full-featured C++ API surface | 
|  | that uses C++ idioms.  The C++ library is also willing to add features to | 
|  | increase server performance, even if these features would add size or complexity | 
|  | to the library.  Because C++ protobuf is a user-level library, API stability is | 
|  | of utmost importance: breaking API changes are rare and carefully managed when | 
|  | they do occur.  The focus on C++ also means that ABI compatibility with C is not | 
|  | a priority. | 
|  |  | 
|  | upb, on the other hand, is designed primarily to be wrapped by other languages. | 
|  | It is a C protobuf kernel that forms the basis on which a user-level protobuf | 
|  | library can be built.  This means we prefer to keep the API surface as small and | 
|  | orthogonal as possible.  While upb supports all protobuf features required for | 
|  | full conformance, upb prioritizes simplicity and small code size, and avoids | 
|  | adding features like lazy fields that can accelerate some use cases but at great | 
|  | cost in terms of complexity.  As upb is not aimed directly at users, there is | 
|  | much more freedom to make API-breaking changes when necessary, which helps the | 
|  | core to stay small and simple.  We want to be compatible with all FFI | 
|  | interfaces, so C ABI compatibility is a must. | 
|  |  | 
|  | Despite these differences, C++ protos and upb offer [roughly the same core set | 
|  | of | 
|  | features](https://github.com/protocolbuffers/protobuf/tree/main/upb#features). | 
|  |  | 
|  | ## Arenas | 
|  |  | 
|  | upb and C++ protos both offer arena allocation, but there are some key | 
|  | differences. | 
|  |  | 
|  | ### C++ | 
|  |  | 
|  | As a matter of history, when C++ protos were open-sourced in 2008, they did not | 
|  | support arenas.  Originally there was only unique ownership, whereby each | 
|  | message uniquely owns all child messages and will free them when the parent is | 
|  | freed. | 
|  |  | 
|  | Arena allocation was added as a feature in 2014 as a way of dramatically | 
|  | reducing allocation and (especially) deallocation costs.  But the library was | 
|  | not at liberty to remove the unique ownership model, because it would break far | 
|  | too many users.  As a result, C++ has supported a **hybrid allocation model** | 
|  | ever since, allowing users to allocate messages either directly from the | 
|  | stack/heap or from an arena.  The library attempts to ensure that there are | 
|  | no dangling pointers by performing automatic copies in some cases (for example | 
|  | `a->set_allocated_b(b)`, where `a` and `b` are on different arenas). | 
|  |  | 
|  | C++'s arena object itself `google::protobuf::Arena` is **thread-safe** by | 
|  | design, which allows users to allocate from multiple threads simultaneously | 
|  | without external synchronization.  The user can supply an initial block of | 
|  | memory to the arena, and can choose some parameters to control the arena block | 
|  | size.  The user can also supply block alloc/dealloc functions, but the alloc | 
|  | function is expected to always return some memory.  The C++ library in general | 
|  | does not attempt to handle out of memory conditions. | 
|  |  | 
|  | ### upb | 
|  |  | 
|  | upb uses **arena allocation exclusively**. All messages must be allocated from | 
|  | an arena, and can only be freed by freeing the arena.  It is entirely the user's | 
|  | responsibility to ensure that there are no dangling pointers: when a user sets a | 
|  | message field, this will always trivially overwrite the pointer and will never | 
|  | perform an implicit copy. | 
|  |  | 
|  | upb's `upb::Arena` is **thread-compatible**, which means it cannot be used | 
|  | concurrently without synchronization. The arena can be seeded with an initial | 
|  | block of memory, but it does not explicitly support any parameters for choosing | 
|  | block size. It supports a custom alloc/dealloc function, and this function is | 
|  | allowed to return `NULL` if no dynamic memory is available. This allows upb | 
|  | arenas to have a max/fixed size, and makes it possible in theory to write code | 
|  | that is tolerant to out-of-memory errors. | 
|  |  | 
|  | upb's arena also supports a novel operation known as **fuse**, which joins two | 
|  | arenas together into a single lifetime.  Though both arenas must still be freed | 
|  | separately, none of the memory will actually be freed until *both* arenas have | 
|  | been freed.  This is useful for avoiding dangling pointers when reparenting a | 
|  | message with one that may be on a different arena. | 
|  |  | 
|  | ### Comparison | 
|  |  | 
|  | **hybrid allocation vs. arena-only** | 
|  |  | 
|  | * The C++ hybrid allocation model introduces a great deal of complexity and | 
|  | unpredictability into the library.  upb benefits from having a much simpler | 
|  | and more predictable design. | 
|  | * Some of the complexity in C++'s hybrid model arises from the fact that arenas | 
|  | were added after the fact.  Designing for a hybrid model from the outset | 
|  | would likely yield a simpler result. | 
|  | * Unique ownership does support some usage patterns that arenas cannot directly | 
|  | accommodate.  For example, you can reparent a message and the child will precisely | 
|  | follow the lifetime of its new parent.  An arena would require you to either | 
|  | perform a deep copy or extend the lifetime. | 
|  |  | 
|  | **thread-compatible vs. thread-safe arena** | 
|  |  | 
|  | * A thread-safe arena (as in C++) is safer and easier to use.  A thread-compatible | 
|  | arena requires that the user prove that the arena cannot be used concurrently. | 
|  | * [Thread Sanitizer](https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual) | 
|  | is far more accessible than it was in 2014 (when C++ introduced a thread-safe | 
|  | arena).  We now have more tools at our disposal to ensure that we do not trigger | 
|  | data races in a thread-compatible arena like upb. | 
|  | * Thread-compatible arenas are more performant. | 
|  | * Thread-compatible arenas have a far simpler implementation.  The C++ thread-safe | 
|  | arena relies on thread-local variables, which introduce complications on some | 
|  | platforms.  It also requires far more subtle reasoning for correctness and | 
|  | performance. | 
|  |  | 
|  | **fuse vs. no fuse** | 
|  |  | 
|  | * The `upb_Arena_Fuse()` operation is a key part of how upb supports reparenting | 
|  | of messages when the parent may be on a different arena.  Without this, upb has | 
|  | no way of supporting `foo.bar = bar` in dynamic languages without performing a | 
|  | deep copy. | 
|  | * A downside of `upb_Arena_Fuse()` is that passing an arena to a function can allow | 
|  | that function to extend the lifetime of the arena in potentially | 
|  | unpredictable ways.  This can be prevented if necessary, as fuse can fail, eg. if | 
|  | one arena has an initial block.  But this adds some complexity by requiring callers | 
|  | to handle the case where fuse fails. | 
|  |  | 
|  | ## Code Generation vs. Tables | 
|  |  | 
|  | The C++ protobuf library has always been built around code generation, while upb | 
|  | generates only tables.  In other words, `foo.pb.cc` files contain functions, | 
|  | whereas `foo.upb.c` files emit only data structures. | 
|  |  | 
|  | ### C++ | 
|  |  | 
|  | C++ generated code emits a large number of functions into `foo.pb.cc` files. | 
|  | An incomplete list: | 
|  |  | 
|  | * `FooMsg::FooMsg()` (constructor): initializes all fields to their default value. | 
|  | * `FooMsg::~FooMsg()` (destructor): frees any present child messages. | 
|  | * `FooMsg::Clear()`: clears all fields back to their default/empty value. | 
|  | * `FooMsg::_InternalParse()`: generated code for parsing a message. | 
|  | * `FooMsg::_InternalSerialize()`: generated code for serializing a message. | 
|  | * `FooMsg::ByteSizeLong()`: calculates serialized size, as a first pass before serializing. | 
|  | * `FooMsg::MergeFrom()`: copies/appends present fields from another message. | 
|  | * `FooMsg::IsInitialized()`: checks whether required fields are set. | 
|  |  | 
|  | This code lives in the `.text` section and contains function calls to the generated | 
|  | classes for child messages. | 
|  |  | 
|  | ### upb | 
|  |  | 
|  | upb does not generate any code into `foo.upb.c` files, only data structures.  upb uses a | 
|  | compact data table known as a *mini table* to represent the schema and all fields. | 
|  |  | 
|  | upb uses mini tables to perform all of the operations that would traditionally be done | 
|  | with generated code.  Revisiting the list from the previous section: | 
|  |  | 
|  | * `FooMsg::FooMsg()` (constructor): upb instead initializes all messages with `memset(msg, 0, size)`. | 
|  | Non-zero defaults are injected in the accessors. | 
|  | * `FooMsg::~FooMsg()` (destructor): upb messages are freed by freeing the arena. | 
|  | * `FooMsg::Clear()`: can be performed with `memset(msg, 0, size)`. | 
|  | * `FooMsg::_InternalParse()`: upb's parser uses mini tables as data, instead of generating code. | 
|  | * `FooMsg::_InternalSerialize()`: upb's serializer also uses mini-tables instead of generated code. | 
|  | * `FooMsg::ByteSizeLong()`: upb performs serialization in reverse so that an initial pass is not required. | 
|  | * `FooMsg::MergeFrom()`: upb supports this via serialize+parse from the other message. | 
|  | * `FooMsg::IsInitialized()`: upb's encoder and decoder have special flags to check for required fields. | 
|  | A util library `upb/util/required_fields.h` handles the corner cases. | 
|  |  | 
|  | ### Comparison | 
|  |  | 
|  | If we compare compiled code size, upb is far smaller.  Here is a comparison of the code | 
|  | size of a trivial binary that does nothing but a parse and serialize of `descriptor.proto`. | 
|  | This means we are seeing both the overhead of the core library itself as well as the | 
|  | generated code (or table) for `descriptor.proto`.  (For extra clarity we should break this | 
|  | down by generated code vs core library in the future). | 
|  |  | 
|  |  | 
|  | | Library         | `.text` | `.data` | `.bss` | | 
|  | |------------     |---------|---------|--------| | 
|  | | upb             |  26Ki   | 0.6Ki   | 0.01Ki | | 
|  | | C++ (lite)      | 187Ki   | 2.8Ki   | 1.25Ki | | 
|  | | C++ (code size) | 904Ki   | 6.1Ki   | 1.88Ki | | 
|  | | C++ (full)      | 983Ki   | 6.1Ki   | 1.88Ki | | 
|  |  | 
|  | "C++ (code size)" refers to protos compiled with `optimize_for = CODE_SIZE`, a mode | 
|  | in which generated code contains reflection only, in an attempt to make the | 
|  | generated code size smaller (however it requires the full runtime instead | 
|  | of the lite runtime). | 
|  |  | 
|  | ## Bifurcated vs. Optional Reflection | 
|  |  | 
|  | upb and C++ protos both offer reflection without making it mandatory.  However | 
|  | the models for enabling/disabling reflection are very different. | 
|  |  | 
|  | ### C++ | 
|  |  | 
|  | C++ messages offer full reflection by default.  Messages in C++ generally | 
|  | derive from `Message`, and the base class provides a member function | 
|  | `Reflection* Message::GetReflection()` which returns the reflection object. | 
|  |  | 
|  | It follows that any message deriving from `Message` will always have reflection | 
|  | linked into the binary, whether or not the reflection object is ever used. | 
|  | Because `GetReflection()` is a function on the base class, it is not possible | 
|  | to statically determine if a given message's reflection is used: | 
|  |  | 
|  | ```c++ | 
|  | Reflection* GetReflection(const Message& message) { | 
|  | // Can refer to any message in the whole binary. | 
|  | return message.GetReflection(); | 
|  | } | 
|  | ``` | 
|  |  | 
|  | The C++ library does provide a way of omitting reflection: `MessageLite`.  We can | 
|  | cause a message to be lite in two different ways: | 
|  |  | 
|  | * `optimize_for = LITE_RUNTIME` in a `.proto` file will cause all messages in that | 
|  | file to be lite. | 
|  | * `lite` as a codegen param: this will force all messages to lite, even if the | 
|  | `.proto` file does not have `optimize_for = LITE_RUNTIME`. | 
|  |  | 
|  | A lite message will derive from `MessageLite` instead of `Message`.  Since | 
|  | `MessageLite` has no `GetReflection()` function, this means no reflection is | 
|  | available, so we can avoid taking the code size hit. | 
|  |  | 
|  | ### upb | 
|  |  | 
|  | upb does not have the `Message` vs. `MessageLite` bifurcation.  There is only one | 
|  | kind of message type `upb_Message`, which means there is no need to configure in | 
|  | a `.proto` file which messages will need reflection and which will not. | 
|  | Every message has the *option* to link in reflection from a separate `foo.upbdefs.o` | 
|  | file, without needing to change the message itself in any way. | 
|  |  | 
|  | upb does not provide the equivalent of `Message::GetReflection()`: there is no | 
|  | facility for retrieving the reflection of a message whose type is not known statically. | 
|  | It would be possible to layer such a facility on top of the upb core, though this | 
|  | would probably require some kind of code generation. | 
|  |  | 
|  | ### Comparison | 
|  |  | 
|  | * Most messages in C++ will not bother to declare themselves as "lite".  This means | 
|  | that many C++ messages will link in reflection even when it is never used, bloating | 
|  | binaries unnecessarily. | 
|  | * `optimize_for = LITE_RUNTIME` is difficult to use in practice, because it prevents | 
|  | any non-lite protos from `import`ing that file. | 
|  | * Forcing all protos to lite via a codegen parameter (for example, when building for | 
|  | mobile) is more practical than `optimize_for = LITE_RUNTIME`.  But this will break | 
|  | the compile for any code that tries to upcast to `Message`, or tries to use a | 
|  | non-lite method. | 
|  | * The one major advantage of the C++ model is that it can support `msg.DebugString()` | 
|  | on a type-erased proto.  For upb you have to explicitly pass the `upb_MessageDef*` | 
|  | separately if you want to perform an operation like printing a proto to text format. | 
|  |  | 
|  | ## Explicit Registration vs. Globals | 
|  |  | 
|  | TODO |