This document is an introduction to the Protobuf Editions project, an ambitious re-imagining of how we migrate Protobuf users into the future.
Enable incremental evolution of Protobuf across the entire ecosystem without introducing permanent forks in the Protobuf language.
syntax
= ...
with edition = ...
..proto
files, so their design is optimized to minimize diffs to .proto
files while permitting fine-grained control..proto
file basis; projects can upgrade incrementally.proto2
/proto3
distinction is going away.Arguably the biggest hard-earned lesson among Software Foundations is that successful migrations are incremental. Most of our experience with these has been for internal migrations. Externally, progress has often ossified because of a lack of established evolution mechanisms. More recently large projects have started planning incremental evolution into their structure. For example, Carbon is heavily focused on evolution as a core precept, and Rust has built language evolution via editions into its core design..
Protobuf is one of Google's oldest and most successful toolchain projects. However, it was designed before we learned and internalized this lesson, making modernization difficult and haphazard. We still have required
and group
, packed
is not everywhere, and string accessors in C++ still return const std::string&
. The last radical change to Protobuf (syntax = "proto3";
) split the ecosystem.
Editions and features are new language features that will allow us to incrementally evolve Protobuf into the future. This will be done by introducing a new syntax
, hopefully the last syntax addition we will ever need.
This high-level document is intended as an introduction to Protobuf Editions for engineers not familiar with the background and the set of tradeoffs that lead us here. Low-level technical details are skipped in preference to describing the kernel of our proposed design. This document reflects the approximate consensus of protobuf-team members who have been developing Protobuf Editions, but please beware: many open questions remain.
A feature, in the narrow context of Protobuf Editions, is an option
on any syntax entity of a .proto
file that has the following properties:
features
, which is present on every syntax entity (file, message, enum, field, etc). It can be of any type, but bool
and enum
are the most common.Features allow us to control the behavior of protoc
, its backends, and the Protobuf runtimes at arbitrary granularity. This is critical for large-scale changes: if a message has few usages, features can be changed at a bigger scope, minimizing diff churn, but if it has heavy usage and the CL to migrate a single field is large, cleanups can happen at the field level, as necessary.
Features won't change a message’s serialization formats (binary, text, or json) in incompatible ways except for extreme circumstances that will always be managed directly by protobuf-team. It is critical for migrations that any behavioral change coming from a feature is the result of a textual change to a .proto
file (either an edition bump or a feature change).
ctype
is an existing field option that looks exactly like a feature: it controls the behavior of the codegen backend, although it does not have the nice ratcheting properties of editions.
Because features can be extensions, language backends can specify language-scoped features. For example, [ctype = CORD]
could instead be phrased as [features.(pb.cpp).string_type = CORD]
. Codegen backends own the definitions of their features.
An edition is a collection of defaults for features understood by protoc
and its backends. Editions are year-numbered, although we have defined a breakout in case we need multiple editions in a particular year.
Instead of writing syntax = "...";
, a Protobuf Editions-enabled .proto
file begins with edition = "2022";
or similar. edition
implies syntax = "editions";
, and the syntax
keyword itself becomes deprecated. This is to ensure that old tools not owned by protobuf-team, which only work for old Protobuf syntaxes, crash or fail quickly and noticeably, instead of wandering into a descriptor that they cannot understand (we will attempt to migrate what we can, of course).
protoc
specifies which editions it understands, and will reject .proto
files “from the future”, since it cannot meaningfully parse them. protoc
backends, which can specify their own set of language-scoped features, must advertise the defaults for a particular edition that they understand (and reject editions that they don‘t). Runtimes must be able to handle descriptors “from the future”; this only means that upon encountering a descriptor with an edition or feature it does not understand, there must be a reasonable fallback for the runtime’s behavior.
Editions provide the fundamental increments for the lifecycle of a feature. At this point it is important to reiterate that most features will be specific to particular code generators. What follows is an example life cycle for a theoretical feature–features.(pb.cpp).opaque_repeated_fields
.
Edition “2025” creates features.(pb.cpp).opaque_repeated_fields
with a default value of false
. This value is equivalent to the behavior from editions less than “2025”.
a. The migration to edition “2025” across google will move very fast as it is a no-op.
Migration begins for features.(pb.cpp).opaque_repeated_fields
(each change in this migration will add features.(pb.cpp).opaque_repeated_fields = true
and be paired with code changes required to C++ code). It is not anticipated that protos shared between repos will undergo field by field migrations like this as that would cause a large stream of breaking changes, see Protobuf Editions for schema producers for more details.
Edition “2027” switches the default of features.(pb.cpp).opaque_repeated_fields
to true
.
a. The migration to “2027” will remove explicit uses of features.(pb.cpp).opaque_repeated_fields = true
and add explicit uses of features.(pb.cpp).opaque_repeated_fields = false
where they were implicit before. As above, this migration will be a no-op, so it will move very fast.
b. Externally, we will release tools and migration guides for OSS customers. The tools will not be fully turnkey, but should provide a strong starting point for user migrations.
Migration continues for features.(pb.cpp).opaque_repeated_fields
(each change in this migration will remove features.(pb.cpp).opaque_repeated_fields = false
and be paired with code changes required to C++ code).
At some point, usage will be officially roped off internally, and externally.
a. Internally, features.(pb.cpp).opaque_repeated_fields
usage will be blocked with allowlists while we remove the hardest to migrate case.
b. Externally, features.(pb.cpp).opaque_repeated_fields
will be marked deprecated in a public edition and removed in a later one. When a feature is removed, the code generators for that behavior and the runtime libraries that support it may also be removed. In this hypothetical, that might be deprecated in “2029” and removed in “2031”. Any release that removes support for a feature would be a major version bump.
The key point to note here is that any .proto
file that does not use deprecated features has a no-op upgrade from one edition to the next and we will provide tools to effect that upgrade. Internal users will be migrated centrally before a feature is deprecated. External users will have the full window of the Google migration as well as the deprecation window to upgrade their own code.
It is also important to note that external users will not receive compiler warnings until the feature is actually deprecated, so we provide a period of deprecation to ensure that they have time to update their code before forcing them to upgrade for an edition update.
Separately from feature evolution, protoc
itself may remove support for old editions entirely after a suitably long window (like 10 years).
The first edition of Protobuf Editions, the so-called “edition zero”, will effectively be a “proto4
” that introduces the new syntax, and merges the semantics of proto2
and proto3
. In editions mode, everything that was possible in proto2
and proto3
will be possible, and the handful of irreconcilable differences will be expressed as features.
For example, whether values not specified in an enum
go into unknown fields vs producing an enum value outside of the bounds of the specified values in the .proto
file (i.e., so-called closed and open enums) will be controlled by feature.enum = OPEN
or feature.enum = CLOSED
.
Edition Zero should be viewed as the “completion” of the union of proto2
and proto3
: it contains both syntaxes as subsets (although with different spellings to disambiguate things) as well as new behavior that was previously inexpressible but which is an obvious consequence of allowing everything from both. For example, proto3
-style non-optional singular fields could allow non-zero defaults.
Edition Zero is designed in such a way that we can mechanically migrate an arbitrary .proto
file from either proto2
or proto3
with no behavioral changes, by replacing syntax
with edition
and adding features in the appropriate locations.
This will form the foundation of Protobuf Editions and the torrent of parallel migrations that will follow.
This will manifest as a handful of new option
s appearing at the top of your files. Going forward, expect new options
to appear and disappear from your .proto
files as LSCs march across the codebase. We intend to minimize disruption, and you should be able to safely ignore them.
In general, you should not need to add option
s yourself unless we say so in documentation. We will try to make sure tooling recommends the latest edition when creating new files.
Everything expressible today will remain so in Edition Zero. Some syntax will change: we will have only one way of spelling a singular field (with optional
vs. the proto3
behavior vs. required
controlled by a feature), group
s will turn into sub message fields with a special encoding.
Long-term bifurcation of the language has resulted in significant damage to the. ecosystem and engineers' mental model of Protobuf. There are features we think are questionable, too, and we want to remove them. But we need to break some eggs to make an omelet.
As stewards of the Protobuf language, we believe this is the best way to get rid of features that were a good idea at the time, but which history has shown to have had poor outcomes.
We plan to upgrade reflection to be feature-aware in a way that minimizes code we need to change. We do not expect anyone to implement feature-inheritance logic themselves; feature inheritance should be fully transparent to users, behaving as if features had been placed explicitly everywhere. (Owners of code generators should be the only ones that need to know how to correctly propagate features.)
We will be partnering with use-cases that are known risks for migration, such as storage providers, to minimize toil and disruption on all sides.
Generally, the owner of the relevant component that ingests a particular feature (protoc
or the appropriate language backend) will own it. We will try to make it as straightforward as we can to add a language-scoped feature, but it may require some degree of coordination with us to get it into an edition.
Even if it‘s about one of protobuf-team’s backends, we'd love to hear what you think we can fix, within the constraints of editions.
We want to share a variant of this document with the OSS community. We plan to publish migration guides and, where feasible, any migration tooling, such as the proto2
/proto3
-> edition
migrator.
As stated above, we want to minimize friction for non-protobuf-team-owned backends, and this ties into helping third party code generators minimize their pain.
Yes, but you get to keep both pieces. Failing to migrate off of old use-cases and into newer versions that fix known defects is a risk for the entire ecosystem: C++'s disastrous standardization process is a solemn warning of failing to do so.
Trying to stay on proto2
or proto3
will eventually cease to be supported, and old editions (e.g. 5 years) will also cease to be supported. Evolution is at the heart of Protobuf, and we want to make it as easy as possible for users to keep up with our progress towards a better Protobuf.
An incomplete list of ideas, which should be taken as non-committal.
required
completely by making a particular field be optional but serialized unconditionally.string
require UTF-8 checking, and all uses that don't want/need it bytes
, fulfilling the original proto3
vision.string
and bytes
accessor in C++ return absl::string_view
, unlocking performance optimizations.repeated
fields packed
, improving throughput.enum
enumerators in C++ use kName
instead of NAME
.enum
declarations in C++ into scoped enum class
.CTYPE
into a language-scoped feature.