doc/wrapping-upb.in.md - third_party/protobuf - Git at Google


 <!---
 This document contains embedded graphviz diagrams inside ```dot blocks.

 To convert it to rendered form using render.py:
   $ ./render.py wrapping-upb.in.md

 You can also live-preview this document with all diagrams using Markdown Preview Enhanced
 in Visual Studio Code:
   https://marketplace.visualstudio.com/items?itemName=shd101wyy.markdown-preview-enhanced
 --->

 # Wrapping upb in other languages

 upb is a C kernel that is designed to be wrapped in other languages.  This is a
 guide for creating a new protobuf implementation based on upb.

 ## What you will need

 There are certain things that the language runtime must provide in order to be
 wrapped by upb.

 1. **Finalizers, Destructors, or Cleaners**: This is one unavoidable
    requirement: the language *must* provide finalizers or destructors of some sort.
    There must be a way of calling a C function when the language GCs or otherwise
    destroys an object.  We don't care much whether it is a finalizer, a destructor,
    or a cleaner, as long as it gets called eventually when the object is destroyed.
    Without finalizers, we would have no way of cleaning up upb data and everything
    would leak.
 2. **HashMap with weak values**: This is not an absolute requirement, but in
    languages with automatic memory management, we generally end up wanting a
    hash map with weak values to act as a `upb_msg* -> wrapper` object cache.
    We want the values to be weak (not the keys).

 ## Reflection vs. Direct Access

 Each language wrapping upb gets to decide whether it will access messages
 through *reflection* or through *direct access*.  This decision has some deep
 implications that will affect the design, features, and performance of your
 library.

 ### Reflection

 The simplest option is to load full reflection data into the upb library at
 runtime.  You can load reflection data using serialized descriptors, which are a
 stable and widely supported format across all protobuf tooling.

 ```c
   // A upb_symtab is a dynamic container that we can load reflection data into.
   upb_symtab* symtab = upb_symtab_new();

   // We load reflection data via a serialized descriptor.  The code generator
   // for your language should embed serialized descriptors into your generated
   // files. For each generated file loaded by your library, you can add the
   // serialized descriptor to the symtab as shown.
   upb_arena *tmp = upb_arena_new();
   google_protobuf_FileDescriptorProto* file =
       google_protobuf_FileDescriptorProto_parse(desc_data, desc_size, tmp);
   if (!file || !upb_symtab_addfile(symtab, file, NULL)) {
     // Handle error.
   }
   upb_arena_free(tmp);

   // At application exit, we free the symtab.
   upb_symtab_free(symtab);
 ```

 The `upb_symtab` will give you full access to all data from the `.proto` file,
 including convenient APIs like looking up a field by name. It will allow you to
 use JSON and text format.  The APIs for accessing a message through reflection
 are simple and well-supported.  These APIs cleanly encapsulate upb's internal
 implementation details.

 ```c
   upb_symtab* symtab = BuildSymtab();

   // Look up a message type in the symtab.
   const upb_msgdef* m = upb_symtab_lookupmsg(symtab, "FooMessage");

   // Construct a new message of this type, via reflection.
   upb_arena *arena = upb_arena_new();
   upb_msg *msg = upb_msg_new(m, arena);

   // Set a message field using reflection.
   const upb_fielddef* f = upb_msgdef_ntof("bar_field");
   upb_msgval val = {.int32_val = 123};
   upb_msg_set(m, f, val, arena);

   // Free the message and symtab.
   upb_arena_free(arena);
   upb_symtab_free(symtab);
 ```

 Using reflection is a natural choice in heavily reflective, dynamic runtimes
 like Python, Ruby, PHP, or Lua.  These languages generally perform method
 dispatch through a dictionary/hash table anyway, so we are not adding any extra
 overhead by using upb's hash table to lookup fields by name at field access
 time.

 ### Direct Access

 Using reflection has some downsides.  Reflection data is relatively large, both
 in your binary (at rest) and in RAM (at runtime).  It contains names of
 everything, and these names will be exposed in your binary.  Reflection APIs for
 accessing a message will have more overhead than you might want, especially if
 crossing the FFI boundary for your language runtime imposes significant
 overhead.

 We can reduce these overheads by using *direct access*.  upb's parser and
 serializer do not actually require full reflection data, they use a more compact
 data structure known as **mini tables**.  Mini tables will take up less space
 than reflection, both in the binary and in RAM, and they will not leak field
 names.  Mini tables will let us parse and serialize binary wire format data
 without reflection.

 ```c
   // TODO: demonstrate upb API for loading mini table data at runtime.
   // This API does not exist yet.
 ```

 To access messages themselves without the reflection API, we will be using
 different, lower-level APIs that will require you to supply precise data such as
 the offset of a given field.  This is information that will come from the upb
 compiler framework, and the correctness (and even memory safety!) of the program
 will rely on you passing these values through from the upb compiler libraries to
 the upb runtime correctly.

 ```c
   // TODO: demonstrate using low-level APIs for direct field access.
   // These APIs do not exist yet.
 ```

 It can even be possible in certain circumstances to bypass the upb API completely
 and access raw field data directly at a given offset, using unsafe APIs like
 `sun.misc.unsafe`.  This can theoretically allow for field access that is no
 more expensive than referencing a struct/class field.

 ```java
 import sun.misc.Unsafe;

 class FooProto {
   private final long addr;
   private final Arena arena;

   // Accessor that a Java library built on upb could conceivably generate.
   long getFoo() {
     // The offset 1234 came from the upb compiler library, and was injected by the
     // Java+upb code generator.
     return Unsafe.getLong(self.addr + 1234);
   }
 }
 ```

 It is always possible to load reflection data as desired, even if your library
 is designed primarily around direct access.  Users who want to use JSON, text
 format, or reflection could potentially load reflection data from separate
 generated modules, for cases where they do not mind the size overhead or the
 leaking of field names. You do not give up any of these possibilities by using
 direct access.

 However, using direct access does have some noticeable downsides.  It requires
 tighter coupling with upb's implementation details, as the mini table format is
 upb-specific and requires building your code generator against upb's compiler
 libraries.  Any direct access of memory is especially tightly coupled, and would
 need to be changed if upb's in-memory format ever changes.  It also is more
 prone to hard-to-debug memory errors if you make any mistakes.

 ## Memory Management

 One of the core design challenges when wrapping upb is memory management.  Every
 language runtime will have some memory management system, whether it is
 garbage collection, reference counting, manual memory management, or some hybrid
 of these.  upb is written in C and uses arenas for memory management, but upb is
 designed to integrate with a wide variety of memory management schemes, and it
 provides a number of tools for making this integration as smooth as possible.

 ### Arenas

 upb defines data structures in C to represent messages, arrays (repeated
 fields), and maps.  A protobuf message is a hierarchical tree of these objects.
 For example, a relatively simple protobuf tree might look something like this:

 ```dot {align="center"}
 digraph G {
   rankdir=LR;
   newrank=true;
   node [style="rounded,filled" shape=box colorscheme=accent8 fillcolor=1, ordering=out]
   upb_msg -> upb_msg2;
   upb_msg -> upb_array;
   upb_msg [label="upb Message" fillcolor=1]
   upb_msg2 [label="upb Message"];
   upb_array [label="upb Array"]
 }
 ```

 All upb objects are allocated from an arena.  An arena lets you allocate objects
 individually, but you cannot free individual objects; you can only free the arena
 as a whole.  When the arena is freed, all of the individual objects allocated
 from that arena are freed together.

 ```dot {align="center"}
 digraph G {
   rankdir=LR;
   newrank=true;
   subgraph cluster_0 {
     label = "upb Arena"
     graph[style="rounded,filled" fillcolor=gray]
     node [style="rounded,filled" shape=box colorscheme=accent8 fillcolor=1, ordering=out]
     upb_msg -> upb_array;
     upb_msg -> upb_msg2;
     upb_msg [label="upb Message" fillcolor=1]
     upb_msg2 [label="upb Message"];
     upb_array [label="upb Array"];
   }
 }
 ```

 In simple cases, the entire tree of objects will all live in a single arena.
 This has the nice property that there cannot be any dangling pointers between
 objects, since all objects are freed at the same time.

 However upb allows you to create links between any two objects, whether or
 not they are in the same arena.  The library does not know or care what arenas
 the objects are in when you create links between them.

 ```dot {align="center"}
 digraph G {
   rankdir=LR;
   newrank=true;
   subgraph cluster_0 {
     label = "upb Arena 1"
     graph[style="rounded,filled" fillcolor=gray]
     node [style="rounded,filled" shape=box colorscheme=accent8 fillcolor=1, ordering=out]
     upb_msg -> upb_array;
     upb_msg -> upb_msg2;
     upb_msg [label="upb Message 1" fillcolor=1]
     upb_msg2 [label="upb Message 2"];
     upb_array [label="upb Array"];
   }
   subgraph cluster_1 {
     label = "upb Arena 2"
     graph[style="rounded,filled" fillcolor=gray]
     node [style="rounded,filled" shape=box colorscheme=accent8 fillcolor=1]
     upb_msg3;
   }
   upb_msg2 -> upb_msg3;
   upb_msg3 [label="upb Message 3"];
 }
 ```

 When objects are on separate arenas, it is the user's responsibility to ensure
 that there are no dangling pointers.  In the example above, this means Arena 2
 must outlive Message 1 and Message 2.

 ### Integrating GC with upb

 In languages with automatic memory management, the goal is to handle all of the
 arenas behind the scenes, so that the user does not have to manage them manually
 or even know that they exist.

 We can achieve this goal if we set up the object graph in a particular way.  The
 general strategy is to create wrapper objects around all of the C objects,
 including the arena.  Our key goal is to make sure the arena wrapper is not
 GC'd until all of the C objects in that arena have become unreachable.

 For this example, we will assume we are wrapping upb in Python:

 ```dot {align="center"}
 digraph G {
   rankdir=LR;
   newrank=true;
   compound=true;

   subgraph cluster_1 {
     label = "upb Arena"
     graph[style="rounded,filled" fillcolor=gray]
     node [style="rounded,filled" shape=box colorscheme=accent8 fillcolor=1, ordering=out]
     upb_msg -> upb_array [style=dashed];
     upb_msg -> upb_msg2 [style=dashed];
     upb_msg [label="upb Message" fillcolor=1]
     upb_msg2 [label="upb Message"];
     upb_array [label="upb Array"]
     dummy [style=invis]
   }
   subgraph cluster_python {
     node [style="rounded,filled" shape=box colorscheme=accent8 fillcolor=2]
     peripheries=0
     py_upb_msg [label="Python Message"];
     py_upb_msg2 [label="Python Message"];
     py_upb_arena [label="Python Arena"];
   }
   py_upb_msg -> upb_msg [style=dashed];
   py_upb_msg2->upb_msg2 [style=dashed];
   py_upb_msg2 -> py_upb_arena [color=springgreen4];
   py_upb_msg -> py_upb_arena [color=springgreen4];
   py_upb_arena -> dummy [lhead=cluster_1, color=red];
   {
      rank=same;
      upb_msg;
      py_upb_msg;
   }
   {
      rank=same;
      upb_array;
      upb_msg2;
      py_upb_msg2;
   }
   {  rank=same;
      dummy;
      py_upb_arena;
   }
   dummy->upb_array [style=invis];
   dummy->upb_msg2 [style=invis];

   subgraph cluster_01 {
     node [shape=plaintext]
     peripheries=0
     key [label=<<table border="0" cellpadding="2" cellspacing="0" cellborder="0">
       <tr><td align="right" port="i1">raw ptr</td></tr>
       <tr><td align="right" port="i2">unique ptr</td></tr>
       <tr><td align="right" port="i3">shared (GC) ptr</td></tr>
       </table>>]
     key2 [label=<<table border="0" cellpadding="2" cellspacing="0" cellborder="0">
       <tr><td port="i1">&nbsp;</td></tr>
       <tr><td port="i2">&nbsp;</td></tr>
       <tr><td port="i3">&nbsp;</td></tr>
       </table>>]
     key:i1:e -> key2:i1:w [style=dashed]
     key:i2:e -> key2:i2:w [color=red]
     key:i3:e -> key2:i3:w [color=springgreen4]
   }
     key2:i1:w -> upb_msg [style=invis];
   {
     rank=same;
     key;
     upb_msg;
   }
 }
 ```

 In this example we have three different kinds of pointers:

 * **raw ptr**: This is a pointer that carries no ownership.
 * **unique ptr**: This is a pointer has *unique ownership* of the target.  The owner
   will free the target in its destructor (or finalizer, or cleaner).  There can
   only be a single unique pointer to a given object.
 * **shared (GC) ptr**: This is a pointer that has *shared ownership* of the
   target.  Many objects can point to the target, and the target will be deleted
   only when all such references are gone.  In a runtime with automatic memory
   management (GC), this is a reference that participates in GC.  In Python such
   references use reference counting, but in other VMs they may use mark and
   sweep or some other form of GC instead.

 The Python Message wrappers have only raw pointers to the underlying message,
 but they contain a shared pointer to the arena that will ensure that the raw
 pointer remains valid.  Only when all message wrapper objects are destroyed
 will the Python Arena become unreachable, and the upb arena ultimately freed.

 ### Links between arenas with "Fuse"

 The design given above works well for objects that live in a single arena. But
 what if a user wants to create a link between two objects in different arenas?

 TODO

 ## UTF-8 vs. UTF-16

 TODO

 ## Object Cache

 TODO

	<!---
	This document contains embedded graphviz diagrams inside ```dot blocks.

	To convert it to rendered form using render.py:
	$ ./render.py wrapping-upb.in.md

	You can also live-preview this document with all diagrams using Markdown Preview Enhanced
	in Visual Studio Code:
	https://marketplace.visualstudio.com/items?itemName=shd101wyy.markdown-preview-enhanced
	--->

	# Wrapping upb in other languages

	upb is a C kernel that is designed to be wrapped in other languages. This is a
	guide for creating a new protobuf implementation based on upb.

	## What you will need

	There are certain things that the language runtime must provide in order to be
	wrapped by upb.

	1. Finalizers, Destructors, or Cleaners: This is one unavoidable
	requirement: the language must provide finalizers or destructors of some sort.
	There must be a way of calling a C function when the language GCs or otherwise
	destroys an object. We don't care much whether it is a finalizer, a destructor,
	or a cleaner, as long as it gets called eventually when the object is destroyed.
	Without finalizers, we would have no way of cleaning up upb data and everything
	would leak.
	2. HashMap with weak values: This is not an absolute requirement, but in
	languages with automatic memory management, we generally end up wanting a
	hash map with weak values to act as a `upb_msg* -> wrapper` object cache.
	We want the values to be weak (not the keys).

	## Reflection vs. Direct Access

	Each language wrapping upb gets to decide whether it will access messages
	through reflection or through direct access. This decision has some deep
	implications that will affect the design, features, and performance of your
	library.

	### Reflection

	The simplest option is to load full reflection data into the upb library at
	runtime. You can load reflection data using serialized descriptors, which are a
	stable and widely supported format across all protobuf tooling.

	```c
	// A upb_symtab is a dynamic container that we can load reflection data into.
	upb_symtab* symtab = upb_symtab_new();

	// We load reflection data via a serialized descriptor. The code generator
	// for your language should embed serialized descriptors into your generated
	// files. For each generated file loaded by your library, you can add the
	// serialized descriptor to the symtab as shown.
	upb_arena *tmp = upb_arena_new();
	google_protobuf_FileDescriptorProto* file =
	google_protobuf_FileDescriptorProto_parse(desc_data, desc_size, tmp);
	if (!file \|\| !upb_symtab_addfile(symtab, file, NULL)) {
	// Handle error.
	}
	upb_arena_free(tmp);

	// At application exit, we free the symtab.
	upb_symtab_free(symtab);
	```

	The `upb_symtab` will give you full access to all data from the `.proto` file,
	including convenient APIs like looking up a field by name. It will allow you to
	use JSON and text format. The APIs for accessing a message through reflection
	are simple and well-supported. These APIs cleanly encapsulate upb's internal
	implementation details.

	```c
	upb_symtab* symtab = BuildSymtab();

	// Look up a message type in the symtab.
	const upb_msgdef* m = upb_symtab_lookupmsg(symtab, "FooMessage");

	// Construct a new message of this type, via reflection.
	upb_arena *arena = upb_arena_new();
	upb_msg *msg = upb_msg_new(m, arena);

	// Set a message field using reflection.
	const upb_fielddef* f = upb_msgdef_ntof("bar_field");
	upb_msgval val = {.int32_val = 123};
	upb_msg_set(m, f, val, arena);

	// Free the message and symtab.
	upb_arena_free(arena);
	upb_symtab_free(symtab);
	```

	Using reflection is a natural choice in heavily reflective, dynamic runtimes
	like Python, Ruby, PHP, or Lua. These languages generally perform method
	dispatch through a dictionary/hash table anyway, so we are not adding any extra
	overhead by using upb's hash table to lookup fields by name at field access
	time.

	### Direct Access

	Using reflection has some downsides. Reflection data is relatively large, both
	in your binary (at rest) and in RAM (at runtime). It contains names of
	everything, and these names will be exposed in your binary. Reflection APIs for
	accessing a message will have more overhead than you might want, especially if
	crossing the FFI boundary for your language runtime imposes significant
	overhead.

	We can reduce these overheads by using direct access. upb's parser and
	serializer do not actually require full reflection data, they use a more compact
	data structure known as mini tables. Mini tables will take up less space
	than reflection, both in the binary and in RAM, and they will not leak field
	names. Mini tables will let us parse and serialize binary wire format data
	without reflection.

	```c
	// TODO: demonstrate upb API for loading mini table data at runtime.
	// This API does not exist yet.
	```

	To access messages themselves without the reflection API, we will be using
	different, lower-level APIs that will require you to supply precise data such as
	the offset of a given field. This is information that will come from the upb
	compiler framework, and the correctness (and even memory safety!) of the program
	will rely on you passing these values through from the upb compiler libraries to
	the upb runtime correctly.

	```c
	// TODO: demonstrate using low-level APIs for direct field access.
	// These APIs do not exist yet.
	```

	It can even be possible in certain circumstances to bypass the upb API completely
	and access raw field data directly at a given offset, using unsafe APIs like
	`sun.misc.unsafe`. This can theoretically allow for field access that is no
	more expensive than referencing a struct/class field.

	```java
	import sun.misc.Unsafe;

	class FooProto {
	private final long addr;
	private final Arena arena;

	// Accessor that a Java library built on upb could conceivably generate.
	long getFoo() {
	// The offset 1234 came from the upb compiler library, and was injected by the
	// Java+upb code generator.
	return Unsafe.getLong(self.addr + 1234);
	}
	}
	```

	It is always possible to load reflection data as desired, even if your library
	is designed primarily around direct access. Users who want to use JSON, text
	format, or reflection could potentially load reflection data from separate
	generated modules, for cases where they do not mind the size overhead or the
	leaking of field names. You do not give up any of these possibilities by using
	direct access.

	However, using direct access does have some noticeable downsides. It requires
	tighter coupling with upb's implementation details, as the mini table format is
	upb-specific and requires building your code generator against upb's compiler
	libraries. Any direct access of memory is especially tightly coupled, and would
	need to be changed if upb's in-memory format ever changes. It also is more
	prone to hard-to-debug memory errors if you make any mistakes.

	## Memory Management

	One of the core design challenges when wrapping upb is memory management. Every
	language runtime will have some memory management system, whether it is
	garbage collection, reference counting, manual memory management, or some hybrid
	of these. upb is written in C and uses arenas for memory management, but upb is
	designed to integrate with a wide variety of memory management schemes, and it
	provides a number of tools for making this integration as smooth as possible.

	### Arenas

	upb defines data structures in C to represent messages, arrays (repeated
	fields), and maps. A protobuf message is a hierarchical tree of these objects.
	For example, a relatively simple protobuf tree might look something like this:

	```dot {align="center"}
	digraph G {
	rankdir=LR;
	newrank=true;
	node [style="rounded,filled" shape=box colorscheme=accent8 fillcolor=1, ordering=out]
	upb_msg -> upb_msg2;
	upb_msg -> upb_array;
	upb_msg [label="upb Message" fillcolor=1]
	upb_msg2 [label="upb Message"];
	upb_array [label="upb Array"]
	}
	```

	All upb objects are allocated from an arena. An arena lets you allocate objects
	individually, but you cannot free individual objects; you can only free the arena
	as a whole. When the arena is freed, all of the individual objects allocated
	from that arena are freed together.

	```dot {align="center"}
	digraph G {
	rankdir=LR;
	newrank=true;
	subgraph cluster_0 {
	label = "upb Arena"
	graph[style="rounded,filled" fillcolor=gray]
	node [style="rounded,filled" shape=box colorscheme=accent8 fillcolor=1, ordering=out]
	upb_msg -> upb_array;
	upb_msg -> upb_msg2;
	upb_msg [label="upb Message" fillcolor=1]
	upb_msg2 [label="upb Message"];
	upb_array [label="upb Array"];
	}
	}
	```

	In simple cases, the entire tree of objects will all live in a single arena.
	This has the nice property that there cannot be any dangling pointers between
	objects, since all objects are freed at the same time.

	However upb allows you to create links between any two objects, whether or
	not they are in the same arena. The library does not know or care what arenas
	the objects are in when you create links between them.

	```dot {align="center"}
	digraph G {
	rankdir=LR;
	newrank=true;
	subgraph cluster_0 {
	label = "upb Arena 1"
	graph[style="rounded,filled" fillcolor=gray]
	node [style="rounded,filled" shape=box colorscheme=accent8 fillcolor=1, ordering=out]
	upb_msg -> upb_array;
	upb_msg -> upb_msg2;
	upb_msg [label="upb Message 1" fillcolor=1]
	upb_msg2 [label="upb Message 2"];
	upb_array [label="upb Array"];
	}
	subgraph cluster_1 {
	label = "upb Arena 2"
	graph[style="rounded,filled" fillcolor=gray]
	node [style="rounded,filled" shape=box colorscheme=accent8 fillcolor=1]
	upb_msg3;
	}
	upb_msg2 -> upb_msg3;
	upb_msg3 [label="upb Message 3"];
	}
	```

	When objects are on separate arenas, it is the user's responsibility to ensure
	that there are no dangling pointers. In the example above, this means Arena 2
	must outlive Message 1 and Message 2.

	### Integrating GC with upb

	In languages with automatic memory management, the goal is to handle all of the
	arenas behind the scenes, so that the user does not have to manage them manually
	or even know that they exist.

	We can achieve this goal if we set up the object graph in a particular way. The
	general strategy is to create wrapper objects around all of the C objects,
	including the arena. Our key goal is to make sure the arena wrapper is not
	GC'd until all of the C objects in that arena have become unreachable.

	For this example, we will assume we are wrapping upb in Python:

	```dot {align="center"}
	digraph G {
	rankdir=LR;
	newrank=true;
	compound=true;

	subgraph cluster_1 {
	label = "upb Arena"
	graph[style="rounded,filled" fillcolor=gray]
	node [style="rounded,filled" shape=box colorscheme=accent8 fillcolor=1, ordering=out]
	upb_msg -> upb_array [style=dashed];
	upb_msg -> upb_msg2 [style=dashed];
	upb_msg [label="upb Message" fillcolor=1]
	upb_msg2 [label="upb Message"];
	upb_array [label="upb Array"]
	dummy [style=invis]
	}
	subgraph cluster_python {
	node [style="rounded,filled" shape=box colorscheme=accent8 fillcolor=2]
	peripheries=0
	py_upb_msg [label="Python Message"];
	py_upb_msg2 [label="Python Message"];
	py_upb_arena [label="Python Arena"];
	}
	py_upb_msg -> upb_msg [style=dashed];
	py_upb_msg2->upb_msg2 [style=dashed];
	py_upb_msg2 -> py_upb_arena [color=springgreen4];
	py_upb_msg -> py_upb_arena [color=springgreen4];
	py_upb_arena -> dummy [lhead=cluster_1, color=red];
	{
	rank=same;
	upb_msg;
	py_upb_msg;
	}
	{
	rank=same;
	upb_array;
	upb_msg2;
	py_upb_msg2;
	}
	{ rank=same;
	dummy;
	py_upb_arena;
	}
	dummy->upb_array [style=invis];
	dummy->upb_msg2 [style=invis];

	subgraph cluster_01 {
	node [shape=plaintext]
	peripheries=0
	key [label=<<table border="0" cellpadding="2" cellspacing="0" cellborder="0">
	<tr><td align="right" port="i1">raw ptr</td></tr>
	<tr><td align="right" port="i2">unique ptr</td></tr>
	<tr><td align="right" port="i3">shared (GC) ptr</td></tr>
	</table>>]
	key2 [label=<<table border="0" cellpadding="2" cellspacing="0" cellborder="0">
	<tr><td port="i1"> </td></tr>
	<tr><td port="i2"> </td></tr>
	<tr><td port="i3"> </td></tr>
	</table>>]
	key:i1:e -> key2:i1:w [style=dashed]
	key:i2:e -> key2:i2:w [color=red]
	key:i3:e -> key2:i3:w [color=springgreen4]
	}
	key2:i1:w -> upb_msg [style=invis];
	{
	rank=same;
	key;
	upb_msg;
	}
	}
	```

	In this example we have three different kinds of pointers:

	* raw ptr: This is a pointer that carries no ownership.
	* unique ptr: This is a pointer has unique ownership of the target. The owner
	will free the target in its destructor (or finalizer, or cleaner). There can
	only be a single unique pointer to a given object.
	* shared (GC) ptr: This is a pointer that has shared ownership of the
	target. Many objects can point to the target, and the target will be deleted
	only when all such references are gone. In a runtime with automatic memory
	management (GC), this is a reference that participates in GC. In Python such
	references use reference counting, but in other VMs they may use mark and
	sweep or some other form of GC instead.

	The Python Message wrappers have only raw pointers to the underlying message,
	but they contain a shared pointer to the arena that will ensure that the raw
	pointer remains valid. Only when all message wrapper objects are destroyed
	will the Python Arena become unreachable, and the upb arena ultimately freed.

	### Links between arenas with "Fuse"

	The design given above works well for objects that live in a single arena. But
	what if a user wants to create a link between two objects in different arenas?

	TODO

	## UTF-8 vs. UTF-16

	TODO

	## Object Cache

	TODO