Approved: 2021-10-07
The Protobuf Team has been exploring potential facilities for introducing breaking API and semantic changes. This document is an attempt to make use of these facilities to unify proto semantics from this point forward, while giving customers the ability to more-granularly manage their project's specific needs.
We want to reduce complications of API semantics that are coarsely managed through the syntax keyword, and instead default to converged proto2/proto3 when opting into editions. Where needed, customers will be able to opt out of specific semantics that are incompatible with their existing usages, at a fine-grained level, using the new capabilities provided by editions + features.
As we introduce new facilities for managing breaking changes, we have an additional opportunity to cutover and rectify a long-standing vision of converging proto 2/3 semantics as a natural extension of this.
Doing this in lockstep with the introduction of editions provides the protobuf team with a few valuable outcomes:
Editions provides us with more granular specification of intent than the existing coarse knob of “proto2” or “proto3.” By opting into our first edition, customers are upgrading to what we've referred to in the past as “converged semantics,” and if needed can reversibly downgrade back to proto2 or proto3 semantics respectively by opt-ing out of the specific features that are incompatible with their existing needs.
The protobuf team can avoid the n^2 complexity of considering how an edition/feature will interplay with an explicit syntax designation of “proto2” vs “proto3” for all impacted runtimes. This allows us to transition our thinking/support model to be explicitly feature-centric.
The introduction of editions will almost certainly cause a major version bump and gives us ample justification to make breaking changes as we transition to this granular specification.
edition
keyword to proto IDLThe edition
keyword is used to define which semantic version a particular file and all of its contents will adhere to as a baseline. Whenever a proto file declares an edition
keyword, it automatically defaults to converged proto2/3 semantics.
An edition's value is represented as a string, encoded by convention as a year.
features
option to descriptor.proto
This option will be uniformly defined as a repeated set of strings which can be used to encode the ability to opt-out of a specific feature (eg: "-string_view"
), or to potentially opt-in to a future/experimental feature (eg: "string_view"
). The features
option will be added to descriptor.proto
for the following descriptor options:
Features are only respected when used in conjunction with the edition
keyword. They are not validated for correctness to ensure they are forward/backward compatible with releases.
Features may be declared at any descriptor level, however, a feature definition may influence descendant types at the discretion of the protobuf team. (e.g., a file-level feature opt-out could impact all fields within the file, if it was desired).
Features can be broken down into two main categories: language-specific and semantic.
Language-specific features pertain to the generated API for a given language. Referring to the protobuf breaking changes backlog we can see some examples:
string_view
.Enum#valueOf(int)
API.Language-specific features have no meaning for any other language: they can be ignored entirely. They are, in essence, a private (tunneled) interface between protobuf IDL and the respective code generator. Each language's code generator can independently decide what the “base” set of features is for any given edition. Each language defines the migration path between editions independently.
Semantic features define behavior changes that apply to the protobuf data model, independent of language. These can also have API implications, but their meaning goes deeper than just a surface-level API. Some examples of semantic features:
UnknownFieldSet
).Semantic features have significantly broader scope, since they must be respected across languages, and each language must implement the semantic correctly. This also implies that every language must either (1) know the canonical set of “base features” for each edition, or (2) that the set of “default” features for the edition must be resolved in protoc itself and propagated explicitly into the descriptor.
Changing descriptor.proto
to reflect editions is a much more intrusive change than changing just the protobuf IDL. The protobuf IDL is parsed and resolved in protoc, and we have only a single implementation of that parser. Any change that can be resolved in the parser alone is relatively unintrusive (though there are build horizon issues since GCL parses protos in prod).
Rev'ing descriptor.proto
is a far more intrusive change that affects many downstream systems. Many systems access descriptors through either a descriptor API (for example, google::protobuf::Descriptor
in C++) or by directly accessing a proto from descriptor.proto
(eg. google.protobuf.DescriptorProto
). Any changes here need to be managed much more delicately.
syntax
keyword from proto IDLThe syntax
keyword shall no longer be required/observed when an edition
keyword is present, as it is now considered redundant. If edition
and syntax
are both present, edition
takes precedence and syntax
is ignored.
proto2
and proto3
to Editions + FeaturesToday's usage of syntax opaquely bundles a collection of implied feature flags that are set based on the presence of proto2
or proto3
. This is often a source of confusion for customers (eg: what am I gaining by moving to proto3? What am I losing?).
By deciding that editions/features exist in a state of proto2/3 convergence, this enables customers to decide for themselves what features are important to their usage of protos.
Migrating existing users of proto2 and proto3 to editions w/converged semantics would mean we'd need to execute a large-scale change to make their implicit/implied behavior explicit. Here are examples of implied behavior. today:
features
for Large DeploymentsA separate concept has been established to help mitigate the complexity of editions and progressive feature rollouts and synchronizations for larger proto projects.
This facility could be used to migrate existing usages of the syntax
keyword to use Editions + Features across google3, for example.
proto{2,3} Convergence Vision (not available externally)
Epochs for descriptor.proto (not available externally)
Rust editions: https://doc.rust-lang.org/edition-guide/editions/index.html