docs/design/editions/protobuf-editions-for-schema-producers.md

Protobuf Editions for Schema Producers

Author: @fowles

Explains the expected use of and interaction with editions for schema providers and their customers.

Background

The Protobuf Editions project uses “editions” to allow Protobuf to safely evolve over time. This is primarily accomplished through “features”. The first edition (colloquially known as “Edition Zero”) will use features to unify proto2 and proto3 (Edition Zero Features). This document will use definitions from Protobuf Editions: Rollout but focus primarily on the use case of Schema Producers and Schema Consumers:

Schema Producers are teams that produce .proto files for the consumption of other teams.

As a reminder, features will generally not change the wire format of messages and thus changing the edition for a .proto will not change the wire format of message.

Initial Release

There will be a large period of time during which protoc is able to consume proto3, proto2, and editions files. Once all of the supported protoc releases handle editions, schema producers should upgrade their published .proto files to edition zero. The protobuf team will provide a tool that upgrades proto2 and proto3 files to edition zero in a fully compatible way.

Edition Zero Features

In order to unify proto2 and proto3, “Edition Zero” is taking an opinionated stance on which choices are good and bad, by choosing “good” defaults and requiring explicit requests for the “bad” semantics (Edition Zero Features). Schema producers that are simply upgrading existing .proto files should publish these files as produced by the upgrade tool. This will ensure wire compatibility. Newly published .proto files should use the default values from this first edition.

Steady State Flow

For the most part, editions should not disturb the general pattern for schema producers. Any schema producer should already specify what versions of protobuf they support and should not support versions of protobuf that are themselves unsupported. Schema producers should generally publish all of their .proto files with a consistent edition for the simplicity of their users. When updating the edition for their .proto files, producers should target an edition supported by all of the versions of protobuf in their support matrix. A good rule of thumb is to target the newest edition supported by the oldest release of protobuf in the support matrix.

Best Practices

Publish `.proto` files

Schema producers should publish .proto files and not generated sources. This is already the case and editions do not change it. Publishing generated sources can lead to mismatches between the compiler and runtime environment used. Protobuf does not support mixed generation/runtime configurations and sometimes security patches require updating both.

Minimize use of features

Codegenerator specific features (like features.(pb.cpp).string_field_type) should only be applied within the context of a single code base. Consumers of published schemas may wish to add generator specific features (either by hand or with an automated .proto refactoring tool), but producers should not force that onto users.

Client Usage Patterns

Consumers’ usage is heavily constrained by their build system. Language agnostic build systems, like Bazel, can run protoc as one of the build steps. Language specific build systems, like Maven or Go, make running protoc more difficult and so consumers often avoid it. Languages like Python that traditionally lack a build system are more extreme.

Running `protoc` Directly

Because language-specific features will not change the wire format of messages, clients will be able to update to newer editions or specify specific features appropriate to their environment while still connecting to external endpoints.

In particular, protobuf will provide two distinct mechanisms for supporting these users. First, we will provide tools for automating updates to .proto files in a safe way. These tools will apply semantic patches to .proto files that they can then commit into source control. Second, we will provide primitives in protoc to compile a .proto file and a semantic patch as a set of inputs so that users never have to materialize the modified .proto file. Protobuf team will investigate adding support for semantic patches when it addresses Bazel rules.

In the long term, we want a Bazel rule (and possibly similar for other build systems) that seamlessly packages changes like:

proto_library(
    name = "cloud_spanner_proto",
    modifications = ["cloud_spanner.change_spec"],
    deps = ["@com_google_cloud//spanner"],
)

Publishing Generated Libraries

As a reminder, publishing generated code is not a good idea. It frequently runs afoul of runtime/generation time mismatches and is an active source of confusion where users are unable to reason about what version they are on.

Teams determined to do this anyway should adhere to the following best practices.

Publish generated libraries using semver.
Publish both the generated code and the protobuf runtime.
Pin the protobuf runtime library to the exact version used to generate the library.
When upgrading the protoc generator for a major, minor, or micro release, increment the corresponding number in the published library’s version.
When updating the edition of the .proto file, increment the major number of the published library’s version.

It is worth note that only the last bullet point is new, everything else is a restatement of current best practice.