Joshua Haberman | 11c9468 | 2022-01-06 20:20:45 -0800 | [diff] [blame] | 1 | |
| 2 | <!--- |
| 3 | This document contains embedded graphviz diagrams inside ```dot blocks. |
| 4 | |
| 5 | To convert it to rendered form using render.py: |
| 6 | $ ./render.py wrapping-upb.in.md |
| 7 | |
| 8 | You can also live-preview this document with all diagrams using Markdown Preview Enhanced |
| 9 | in Visual Studio Code: |
| 10 | https://marketplace.visualstudio.com/items?itemName=shd101wyy.markdown-preview-enhanced |
| 11 | ---> |
| 12 | |
Joshua Haberman | 2a5919d | 2022-05-09 10:32:14 -0700 | [diff] [blame] | 13 | # Building a protobuf library on upb |
Joshua Haberman | 11c9468 | 2022-01-06 20:20:45 -0800 | [diff] [blame] | 14 | |
Joshua Haberman | 2a5919d | 2022-05-09 10:32:14 -0700 | [diff] [blame] | 15 | This is a guide for creating a new protobuf implementation based on upb. It |
| 16 | starts from the beginning and walks you through the process, highlighting |
| 17 | some important design choices you will need to make. |
Joshua Haberman | 11c9468 | 2022-01-06 20:20:45 -0800 | [diff] [blame] | 18 | |
Joshua Haberman | 2a5919d | 2022-05-09 10:32:14 -0700 | [diff] [blame] | 19 | ## Overview |
Joshua Haberman | 11c9468 | 2022-01-06 20:20:45 -0800 | [diff] [blame] | 20 | |
Joshua Haberman | 2a5919d | 2022-05-09 10:32:14 -0700 | [diff] [blame] | 21 | A protobuf implementation consists of two main pieces: |
Joshua Haberman | 11c9468 | 2022-01-06 20:20:45 -0800 | [diff] [blame] | 22 | |
Joshua Haberman | 2a5919d | 2022-05-09 10:32:14 -0700 | [diff] [blame] | 23 | 1. a code generator, run at compile time, to turn `.proto` files into source |
| 24 | files in your language (we will call this "zlang", assuming an extension of ".z"). |
| 25 | 2. a runtime component, which implements the wire format and provides the data |
| 26 | structures for representing protobuf data and metadata. |
Joshua Haberman | 11c9468 | 2022-01-06 20:20:45 -0800 | [diff] [blame] | 27 | |
Joshua Haberman | 2a5919d | 2022-05-09 10:32:14 -0700 | [diff] [blame] | 28 | <br/> |
Joshua Haberman | 11c9468 | 2022-01-06 20:20:45 -0800 | [diff] [blame] | 29 | |
Joshua Haberman | 2a5919d | 2022-05-09 10:32:14 -0700 | [diff] [blame] | 30 | ```dot {align="center"} |
| 31 | digraph { |
| 32 | rankdir=LR; |
| 33 | newrank=true; |
| 34 | node [style="rounded,filled" shape=box] |
| 35 | "foo.proto" -> protoc; |
| 36 | "foo.proto" [shape=folder]; |
| 37 | protoc [fillcolor=lightgrey]; |
| 38 | protoc -> "protoc-gen-zlang"; |
| 39 | "protoc-gen-zlang" -> "foo.z"; |
| 40 | "protoc-gen-zlang" [fillcolor=palegreen3]; |
| 41 | "foo.z" [shape=folder]; |
| 42 | labelloc="b"; |
| 43 | label="Compile Time"; |
Joshua Haberman | 11c9468 | 2022-01-06 20:20:45 -0800 | [diff] [blame] | 44 | } |
| 45 | ``` |
| 46 | |
Joshua Haberman | 2a5919d | 2022-05-09 10:32:14 -0700 | [diff] [blame] | 47 | <br/> |
Joshua Haberman | 11c9468 | 2022-01-06 20:20:45 -0800 | [diff] [blame] | 48 | |
Joshua Haberman | 2a5919d | 2022-05-09 10:32:14 -0700 | [diff] [blame] | 49 | ```dot {align="center"} |
| 50 | digraph { |
| 51 | newrank=true; |
| 52 | node [style="rounded,filled" shape=box fillcolor=lightgrey] |
| 53 | "foo.z" -> "zlang/upb glue (FFI)"; |
| 54 | "zlang/upb glue (FFI)" -> "upb (C)"; |
| 55 | "zlang/upb glue (FFI)" [fillcolor=palegreen3]; |
| 56 | labelloc="b"; |
| 57 | label="Runtime"; |
| 58 | } |
| 59 | ``` |
| 60 | |
| 61 | The parts in green are what you will need to implement. |
| 62 | |
| 63 | Note that your code generator (`protoc-gen-zlang`) does *not* need to generate |
| 64 | any C code (eg. `foo.c`). While upb itself is written in C, upb's parsers and |
| 65 | serializers are fully table-driven, which means there is never any need or even |
| 66 | benefit to generating C code for each proto. upb is capable of full-speed |
| 67 | parsing even when schema data is loaded at runtime from strings embedded into |
| 68 | `foo.z`. This is a key benefit of upb compared with C++ protos, which have |
| 69 | traditionally relied on generated parsers in `foo.pb.cc` files to achieve full |
| 70 | parsing speed, and suffered a ~10x speed penalty in the parser when the schema |
| 71 | data was loaded at runtime. |
| 72 | |
| 73 | ## Prerequisites |
| 74 | |
| 75 | There are a few things that the language runtime must provide in order to wrap |
| 76 | upb. |
| 77 | |
| 78 | 1. **FFI**: To wrap upb, your language must be able to call into a C API |
| 79 | through a Foreign Function Interface (FFI). Most languages support FFI in |
| 80 | some form, either through "native extensions" (in which you write some C |
| 81 | code to implement new methods in your language) or through a direct FFI (in |
| 82 | which you can call into regular C functions directly from your language |
| 83 | using a special library). |
| 84 | 2. **Finalizers, Destructors, or Cleaners**: The runtime must provide |
| 85 | finalizers or destructors of some sort. There must be a way of triggering a |
| 86 | call to a C function when the language garbage collects or otherwise |
| 87 | destroys an object. We don't care much whether it is a finalizer, a |
| 88 | destructor, or a cleaner, as long as it gets called eventually when the |
| 89 | object is destroyed. upb allocates memory in C space, and a finalizer is our |
| 90 | only way of making sure that memory is freed and does not leak. |
| 91 | 3. **HashMap with weak values**: (optional) This is not a strong requirement, |
| 92 | but it is sometimes helpful to have a global hashmap with weak values to act |
| 93 | as a `upb_msg* -> wrapper` object cache. We want the values to be weak (not |
| 94 | the keys). There is some question about whether we want to continue to use |
| 95 | this pattern going forward. |
| 96 | |
| 97 | ## Reflection vs. MiniTables |
| 98 | |
| 99 | The first key design decision you will need to make is whether your generated |
| 100 | code will access message data via reflection or minitables. Generally more |
| 101 | dynamic languages will want to use reflection and more static languages will |
| 102 | want to use minitables. |
| 103 | |
| 104 | ### Reflection |
| 105 | |
| 106 | Reflection-based data access makes the most sense in highly dynamic language |
| 107 | interpreters, where method dispatch is generally resolved via strings and hash |
| 108 | table lookups. |
| 109 | |
| 110 | In such languages, you can often implement a special method like `__getattr__` |
| 111 | (Python) or `method_missing` (Ruby) that receives the method name as a string. |
| 112 | Using upb's reflection, you can look up a field name using the method name, |
| 113 | thereby using a hash table belonging to upb instead of one provided by the |
| 114 | language. |
| 115 | |
| 116 | ```python |
| 117 | class FooMessage: |
| 118 | # Written in Python for illustration, but in practice we will want to |
| 119 | # implement this in C for speed. |
| 120 | def __getattr__(self, name): |
| 121 | field = FooMessage.descriptor.fields_by_name[name] |
| 122 | return field.get_value(self) |
| 123 | ``` |
| 124 | |
| 125 | Using this design, we only need to attach a single `__getattr__` method to each |
| 126 | message class, instead of defining a getter/setter for each field. In this way |
| 127 | we can avoid duplicating hash tables between upb and the language interpreter, |
| 128 | reducing memory usage. |
| 129 | |
| 130 | Reflection-based access requires loading full reflection at runtime. Your |
| 131 | generated code will need to embed serialized descriptors (ie. a serialized |
| 132 | message of `descriptor.proto`), which has some amount of size overhead and |
| 133 | exposes all message/field names to the binary. It also forces a hash table |
| 134 | lookup in the critical path of field access. If method calls in your language |
| 135 | already have this overhead, then this is no added burden, but for statically |
| 136 | dispatched languages it would cause extra overhead. |
| 137 | |
| 138 | If we take this path to its logical conclusion, all class creation can be |
| 139 | performed fully dynamically, using only a binary descriptor as input. The |
| 140 | "generated code" becomes little more than an embedded descriptor plus a |
| 141 | library call to load it. Python has recently gone down this path. Generated |
| 142 | code now looks something like this: |
| 143 | |
| 144 | ```python |
| 145 | # main_pb2.py |
| 146 | from google3.net.proto2.python.internal import builder as _builder |
| 147 | from google3.net.proto2.python.public import descriptor_pool as _descriptor_pool |
| 148 | |
| 149 | DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile("<...>") |
| 150 | _builder.BuildMessageAndEnumDescriptors(DESCRIPTOR, globals()) |
| 151 | _builder.BuildTopDescriptorsAndMessages(DESCRIPTOR, 'google3.main_pb2', globals()) |
| 152 | ``` |
| 153 | |
| 154 | This is all the runtime needs to create all of the classes for messages defined |
| 155 | in that serialized descriptor. This code has no pretense of readability, but |
| 156 | a separate `.pyi` stub file provides a fully expanded and readable list of all |
| 157 | methods a user can expect to be available: |
| 158 | |
| 159 | ```python |
| 160 | # main_pb2.pyi |
| 161 | from google3.net.proto2.python.public import descriptor as _descriptor |
| 162 | from google3.net.proto2.python.public import message as _message |
| 163 | from typing import ClassVar as _ClassVar, Optional as _Optional |
| 164 | |
| 165 | DESCRIPTOR: _descriptor.FileDescriptor |
| 166 | |
| 167 | class MyMessage(_message.Message): |
| 168 | __slots__ = ["my_field"] |
| 169 | MY_FIELD_FIELD_NUMBER: _ClassVar[int] |
| 170 | my_field: str |
| 171 | def __init__(self, my_field: _Optional[str] = ...) -> None: ... |
| 172 | ``` |
| 173 | |
| 174 | To use reflection-based access: |
| 175 | |
Protobuf Team Bot | f3a0cc4 | 2022-11-18 10:00:20 -0800 | [diff] [blame] | 176 | 1. Load and access descriptor data using the interfaces in upb/def.h. |
| 177 | 2. Access message data using the interfaces in upb/reflection.h. |
Joshua Haberman | 2a5919d | 2022-05-09 10:32:14 -0700 | [diff] [blame] | 178 | |
| 179 | ### MiniTables |
| 180 | |
Joshua Haberman | 34495f8 | 2022-09-09 12:22:28 -0700 | [diff] [blame] | 181 | MiniTables are a "lite" schema representation that are much smaller than |
Joshua Haberman | 2a5919d | 2022-05-09 10:32:14 -0700 | [diff] [blame] | 182 | reflection. MiniTables omit names, options, and almost everything else from the |
| 183 | `.proto` file, retaining only enough information to parse and serialize binary |
| 184 | format. |
| 185 | |
| 186 | MiniTables can be loaded into upb through *MiniDescriptors*. MiniDescriptors are |
| 187 | a byte-oriented format that can be embedded into your generated code and passed |
| 188 | to upb to construct MiniTables. MiniDescriptors only use printable characters, |
| 189 | and therefore do not require escaping when embedding them into generated code |
| 190 | strings. Overall the size savings of MiniDescriptors are ~60x compared with |
| 191 | regular descriptors. |
| 192 | |
| 193 | MiniTables and MiniDescriptors are a natural choice for compiled languages that |
| 194 | resolve method calls at compile time. For languages that are sometimes compiled |
| 195 | and sometimes interpreted, there might not be an obvious choice. When a method |
| 196 | call is statically bound, we want to remove as much overhead as possible, |
| 197 | especially from accessors. In the extreme case, we can use unsafe APIs to read |
| 198 | raw memory at a known offset: |
| 199 | |
| 200 | ```java |
| 201 | // Example of a maximally-optimized generated accessor. |
| 202 | class FooMessage { |
| 203 | public long getBarField() { |
| 204 | // Using Unsafe should give us performance that is comparable to a |
| 205 | // native member access. |
| 206 | // |
| 207 | // The constant "24" is obtained from upb at compile time. |
| 208 | sun.misc.Unsafe.getLong(this.ptr, 24); |
| 209 | } |
| 210 | } |
| 211 | ``` |
| 212 | |
| 213 | This design is very low-level, and tightly couples the generated code to one |
| 214 | specific version of the schema and compiler. A slower but safer version would |
| 215 | look up a field by field number: |
| 216 | |
| 217 | ```java |
| 218 | // Example of a more loosely-coupled accessor. |
| 219 | class FooMessage { |
| 220 | public long getBarField() { |
| 221 | // The constant "2" is the field number. Internally this will look |
| 222 | // up the number "2" in the MiniTable and use that to read the value |
| 223 | // from the message. |
| 224 | upb.glue.getLong(this.ptr, 2); |
| 225 | } |
| 226 | } |
| 227 | ``` |
| 228 | |
| 229 | One downside of MiniTables is that they cannot support parsing or serializing |
Protobuf Team Bot | bf88f8b | 2022-07-20 09:22:30 -0700 | [diff] [blame] | 230 | to JSON or TextFormat, because they do not know the field names. It should be |
Joshua Haberman | 2a5919d | 2022-05-09 10:32:14 -0700 | [diff] [blame] | 231 | possible to generate reflection data "on the side", into separate generated |
| 232 | code files, so that reflection is only pulled in if it is being used. However |
| 233 | APIs to do this do not exist yet. |
| 234 | |
| 235 | To use MiniTable-based access: |
| 236 | |
Protobuf Team Bot | f3a0cc4 | 2022-11-18 10:00:20 -0800 | [diff] [blame] | 237 | 1. Load and access MiniDescriptors data using the interfaces in upb/mini_table.h. |
| 238 | 2. Access message data using the interfaces in upb/msg_accessors.h. |
Joshua Haberman | 11c9468 | 2022-01-06 20:20:45 -0800 | [diff] [blame] | 239 | |
| 240 | ## Memory Management |
| 241 | |
| 242 | One of the core design challenges when wrapping upb is memory management. Every |
| 243 | language runtime will have some memory management system, whether it is |
| 244 | garbage collection, reference counting, manual memory management, or some hybrid |
| 245 | of these. upb is written in C and uses arenas for memory management, but upb is |
| 246 | designed to integrate with a wide variety of memory management schemes, and it |
| 247 | provides a number of tools for making this integration as smooth as possible. |
| 248 | |
| 249 | ### Arenas |
| 250 | |
| 251 | upb defines data structures in C to represent messages, arrays (repeated |
| 252 | fields), and maps. A protobuf message is a hierarchical tree of these objects. |
| 253 | For example, a relatively simple protobuf tree might look something like this: |
| 254 | |
Joshua Haberman | 6a2c01a | 2022-04-12 08:57:33 -0700 | [diff] [blame] | 255 | ```dot {align="center"} |
| 256 | digraph G { |
| 257 | rankdir=LR; |
| 258 | newrank=true; |
| 259 | node [style="rounded,filled" shape=box colorscheme=accent8 fillcolor=1, ordering=out] |
| 260 | upb_msg -> upb_msg2; |
| 261 | upb_msg -> upb_array; |
| 262 | upb_msg [label="upb Message" fillcolor=1] |
| 263 | upb_msg2 [label="upb Message"]; |
| 264 | upb_array [label="upb Array"] |
| 265 | } |
| 266 | ``` |
Joshua Haberman | 5832e80 | 2022-01-06 21:08:17 -0800 | [diff] [blame] | 267 | |
Joshua Haberman | 11c9468 | 2022-01-06 20:20:45 -0800 | [diff] [blame] | 268 | All upb objects are allocated from an arena. An arena lets you allocate objects |
| 269 | individually, but you cannot free individual objects; you can only free the arena |
| 270 | as a whole. When the arena is freed, all of the individual objects allocated |
| 271 | from that arena are freed together. |
| 272 | |
Joshua Haberman | 6a2c01a | 2022-04-12 08:57:33 -0700 | [diff] [blame] | 273 | ```dot {align="center"} |
| 274 | digraph G { |
| 275 | rankdir=LR; |
| 276 | newrank=true; |
| 277 | subgraph cluster_0 { |
| 278 | label = "upb Arena" |
| 279 | graph[style="rounded,filled" fillcolor=gray] |
| 280 | node [style="rounded,filled" shape=box colorscheme=accent8 fillcolor=1, ordering=out] |
| 281 | upb_msg -> upb_array; |
| 282 | upb_msg -> upb_msg2; |
| 283 | upb_msg [label="upb Message" fillcolor=1] |
| 284 | upb_msg2 [label="upb Message"]; |
| 285 | upb_array [label="upb Array"]; |
| 286 | } |
| 287 | } |
| 288 | ``` |
Joshua Haberman | 5832e80 | 2022-01-06 21:08:17 -0800 | [diff] [blame] | 289 | |
Joshua Haberman | 11c9468 | 2022-01-06 20:20:45 -0800 | [diff] [blame] | 290 | In simple cases, the entire tree of objects will all live in a single arena. |
| 291 | This has the nice property that there cannot be any dangling pointers between |
| 292 | objects, since all objects are freed at the same time. |
| 293 | |
| 294 | However upb allows you to create links between any two objects, whether or |
| 295 | not they are in the same arena. The library does not know or care what arenas |
| 296 | the objects are in when you create links between them. |
| 297 | |
Joshua Haberman | 6a2c01a | 2022-04-12 08:57:33 -0700 | [diff] [blame] | 298 | ```dot {align="center"} |
| 299 | digraph G { |
| 300 | rankdir=LR; |
| 301 | newrank=true; |
| 302 | subgraph cluster_0 { |
| 303 | label = "upb Arena 1" |
| 304 | graph[style="rounded,filled" fillcolor=gray] |
| 305 | node [style="rounded,filled" shape=box colorscheme=accent8 fillcolor=1, ordering=out] |
| 306 | upb_msg -> upb_array; |
| 307 | upb_msg -> upb_msg2; |
| 308 | upb_msg [label="upb Message 1" fillcolor=1] |
| 309 | upb_msg2 [label="upb Message 2"]; |
| 310 | upb_array [label="upb Array"]; |
| 311 | } |
| 312 | subgraph cluster_1 { |
| 313 | label = "upb Arena 2" |
| 314 | graph[style="rounded,filled" fillcolor=gray] |
| 315 | node [style="rounded,filled" shape=box colorscheme=accent8 fillcolor=1] |
| 316 | upb_msg3; |
| 317 | } |
| 318 | upb_msg2 -> upb_msg3; |
| 319 | upb_msg3 [label="upb Message 3"]; |
| 320 | } |
| 321 | ``` |
Joshua Haberman | 5832e80 | 2022-01-06 21:08:17 -0800 | [diff] [blame] | 322 | |
Joshua Haberman | 11c9468 | 2022-01-06 20:20:45 -0800 | [diff] [blame] | 323 | When objects are on separate arenas, it is the user's responsibility to ensure |
| 324 | that there are no dangling pointers. In the example above, this means Arena 2 |
| 325 | must outlive Message 1 and Message 2. |
| 326 | |
| 327 | ### Integrating GC with upb |
| 328 | |
| 329 | In languages with automatic memory management, the goal is to handle all of the |
| 330 | arenas behind the scenes, so that the user does not have to manage them manually |
| 331 | or even know that they exist. |
| 332 | |
| 333 | We can achieve this goal if we set up the object graph in a particular way. The |
| 334 | general strategy is to create wrapper objects around all of the C objects, |
| 335 | including the arena. Our key goal is to make sure the arena wrapper is not |
| 336 | GC'd until all of the C objects in that arena have become unreachable. |
| 337 | |
| 338 | For this example, we will assume we are wrapping upb in Python: |
| 339 | |
Joshua Haberman | 6a2c01a | 2022-04-12 08:57:33 -0700 | [diff] [blame] | 340 | ```dot {align="center"} |
| 341 | digraph G { |
| 342 | rankdir=LR; |
| 343 | newrank=true; |
| 344 | compound=true; |
| 345 | |
| 346 | subgraph cluster_1 { |
| 347 | label = "upb Arena" |
| 348 | graph[style="rounded,filled" fillcolor=gray] |
| 349 | node [style="rounded,filled" shape=box colorscheme=accent8 fillcolor=1, ordering=out] |
| 350 | upb_msg -> upb_array [style=dashed]; |
| 351 | upb_msg -> upb_msg2 [style=dashed]; |
| 352 | upb_msg [label="upb Message" fillcolor=1] |
| 353 | upb_msg2 [label="upb Message"]; |
| 354 | upb_array [label="upb Array"] |
| 355 | dummy [style=invis] |
| 356 | } |
| 357 | subgraph cluster_python { |
| 358 | node [style="rounded,filled" shape=box colorscheme=accent8 fillcolor=2] |
| 359 | peripheries=0 |
| 360 | py_upb_msg [label="Python Message"]; |
| 361 | py_upb_msg2 [label="Python Message"]; |
| 362 | py_upb_arena [label="Python Arena"]; |
| 363 | } |
| 364 | py_upb_msg -> upb_msg [style=dashed]; |
| 365 | py_upb_msg2->upb_msg2 [style=dashed]; |
| 366 | py_upb_msg2 -> py_upb_arena [color=springgreen4]; |
| 367 | py_upb_msg -> py_upb_arena [color=springgreen4]; |
| 368 | py_upb_arena -> dummy [lhead=cluster_1, color=red]; |
| 369 | { |
| 370 | rank=same; |
| 371 | upb_msg; |
| 372 | py_upb_msg; |
| 373 | } |
| 374 | { |
| 375 | rank=same; |
| 376 | upb_array; |
| 377 | upb_msg2; |
| 378 | py_upb_msg2; |
| 379 | } |
| 380 | { rank=same; |
| 381 | dummy; |
| 382 | py_upb_arena; |
| 383 | } |
| 384 | dummy->upb_array [style=invis]; |
| 385 | dummy->upb_msg2 [style=invis]; |
| 386 | |
| 387 | subgraph cluster_01 { |
| 388 | node [shape=plaintext] |
| 389 | peripheries=0 |
| 390 | key [label=<<table border="0" cellpadding="2" cellspacing="0" cellborder="0"> |
| 391 | <tr><td align="right" port="i1">raw ptr</td></tr> |
| 392 | <tr><td align="right" port="i2">unique ptr</td></tr> |
| 393 | <tr><td align="right" port="i3">shared (GC) ptr</td></tr> |
| 394 | </table>>] |
| 395 | key2 [label=<<table border="0" cellpadding="2" cellspacing="0" cellborder="0"> |
| 396 | <tr><td port="i1"> </td></tr> |
| 397 | <tr><td port="i2"> </td></tr> |
| 398 | <tr><td port="i3"> </td></tr> |
| 399 | </table>>] |
| 400 | key:i1:e -> key2:i1:w [style=dashed] |
| 401 | key:i2:e -> key2:i2:w [color=red] |
| 402 | key:i3:e -> key2:i3:w [color=springgreen4] |
| 403 | } |
| 404 | key2:i1:w -> upb_msg [style=invis]; |
| 405 | { |
| 406 | rank=same; |
| 407 | key; |
| 408 | upb_msg; |
| 409 | } |
| 410 | } |
| 411 | ``` |
Joshua Haberman | 5832e80 | 2022-01-06 21:08:17 -0800 | [diff] [blame] | 412 | |
Joshua Haberman | 11c9468 | 2022-01-06 20:20:45 -0800 | [diff] [blame] | 413 | In this example we have three different kinds of pointers: |
| 414 | |
| 415 | * **raw ptr**: This is a pointer that carries no ownership. |
| 416 | * **unique ptr**: This is a pointer has *unique ownership* of the target. The owner |
| 417 | will free the target in its destructor (or finalizer, or cleaner). There can |
| 418 | only be a single unique pointer to a given object. |
| 419 | * **shared (GC) ptr**: This is a pointer that has *shared ownership* of the |
| 420 | target. Many objects can point to the target, and the target will be deleted |
| 421 | only when all such references are gone. In a runtime with automatic memory |
| 422 | management (GC), this is a reference that participates in GC. In Python such |
| 423 | references use reference counting, but in other VMs they may use mark and |
| 424 | sweep or some other form of GC instead. |
| 425 | |
| 426 | The Python Message wrappers have only raw pointers to the underlying message, |
| 427 | but they contain a shared pointer to the arena that will ensure that the raw |
| 428 | pointer remains valid. Only when all message wrapper objects are destroyed |
| 429 | will the Python Arena become unreachable, and the upb arena ultimately freed. |
| 430 | |
| 431 | ### Links between arenas with "Fuse" |
| 432 | |
| 433 | The design given above works well for objects that live in a single arena. But |
| 434 | what if a user wants to create a link between two objects in different arenas? |
| 435 | |
| 436 | TODO |
| 437 | |
| 438 | ## UTF-8 vs. UTF-16 |
| 439 | |
| 440 | TODO |
| 441 | |
| 442 | ## Object Cache |
| 443 | |
| 444 | TODO |