Protobuf Editions Design: Features

Author: @haberman, @fowles

Approved: 2022-10-13

A proposal to use custom options as our way of defining and representing features.

Background

The Protobuf Editions project uses “editions” to allow Protobuf to safely evolve over time. An edition is formally a set of “features” with a default value per feature. The set of features or a default value for a feature can only change with the introduction of a new edition. Features define the specific points of change and evolution on a per entity basis within a .proto file (entities being files, messages, fields, or any other lexical element in the file). The design in this doc supplants an earlier design which used strings for feature definition.

Protobuf already supports custom options and we will leverage these to provide a rich syntax without introducing new syntactic forms into Protobuf.

Sample Usage

Here is a small sample usage of features to give a flavor for how it looks

edition = "2023";

package experimental.users.kfm.editions;

import "net/proto2/proto/features_cpp.proto";

option features.repeated_field_encoding = EXPANDED;
option features.enum = OPEN;
option features.(pb.cpp).string_field_type = STRING;
option features.(pb.cpp).namespace = "kfm::proto_experiments";

message Lab {
  // `Mouse` is open as it inherits the file's value.
  enum Mouse {
    UNKNOWN_MOUSE = 0;
    PINKY = 1;
    THE_BRAIN = 2;
  }
  repeated Mouse mice = 1 [features.repeated_field_encoding = PACKED];

  string name = 2;
  string address = 3 [features.(pb.cpp).string_field_type = CORD];
  string function = 4 [features.(pb.cpp).string_field_type = STRING_VIEW];
}

enum ColorChannel {
  // Turn off the option from the surrounding file
  option features.enum = CLOSED;

  UNKNOWN_COLOR_CHANNEL = 0;
  RED = 1;
  BLUE = 2;
  GREEN = 3;
  ALPHA = 4;
}

Language-Specific Features

We will use extensions to manage features specific to individual code generators.

// In net/proto2/proto/descriptor.proto:
syntax = "proto2";
package proto2;

message Features {
  ...
  extensions 1000;  // for features_cpp.proto
  extensions 1001;  // for features_java.proto
}

This will allow third-party code generators to use editions for their own evolution as long as they reserve a single extension number in descriptor.proto. Using this from a .proto file would look like this:

edition = "2023";

import "third_party/protobuf/compiler/cpp/features_cpp.proto"

message Bar {
  optional string str = 1 [features.(pb.cpp).string_field_type = true];
}

Inheritance

To support inheritance, we will specify a single Features message that extends every kind of option:

// In net/proto2/proto/descriptor.proto:
syntax = "proto2";
package proto2;

message Features {
  ...
}

message FileOptions {
  optional Features features = ..;
}

message MessageOptions {
  optional Features features = ..;
}
// All the other `*Options` protos.

At the implementation level, feature inheritance is exactly the behavior of MergeFrom

void InheritFrom(const Features& parent, Features* child) {
  Features tmp(parent);
  tmp.MergeFrom(child);
  child->Swap(&tmp);
}

which means that custom backends will be able to faithfully implement inheritance without difficulty.

Target Attributes

While inheritance can be useful for minimizing changes or pushing defaults broadly, it can be overused in ways that would make simple refactoring of .proto files harder. Additionally, not all features are meaningful on all entities (for example features.enum = OPEN is meaningless on a field).

To avoid these issues, we will introduce “target” attributes on features (similar in concept to the “target” attribute on Java annotations).

enum FeatureTargetType {
  FILE = 0;
  MESSAGE = 1;
  ENUM = 2;
  FIELD = 3;
  ...
};

These will restrict the set of entities to which a feature may be attached.

message Features {
  ...

  enum EnumType {
    OPEN = 0;
    CLOSED = 1;
  }
  optional EnumType enum = 2 [
      target = ENUM
  ];
}

Retention

To reduce the size of descriptors in protobuf runtimes, features will be permitted to specify retention rules (again similar in concept to “retention” attributes on Java annotations).

enum FeatureRetention {
  SOURCE = 0;
  RUNTIME = 1;
}

Specification of an Edition

An edition is, effectively, an instance of the Feature proto which forms the base for performing inheritance using MergeFrom. This allows protoc and specific language generators to leverage existing formats (like text-format) for specifying the value of features at a given edition.

Although naively we would think that field defaults are the right approach, this does not quite work, because the default is editions-dependent. Instead, we propose adding the following to the protoc-provided features.proto:

message Features {
  // ...
  message EditionDefault {
    optional string edition = 1;
    optional string default = 2;  // Textproto value.
  }

  extend FieldOptions {
    // Ideally this is a map, but map extensions are not permitted...
    repeated EditionDefault edition_defaults = 9001;
  }
}

To build the edition defaults for a particular edition current in the context of a particular file foo.proto, we execute the following algorithm:

  1. Construct a new Features feats;.
  2. For each field in Features, take the value of the Features.edition_defaults option (call it defaults), and sort it by the value of edition (per the total order for edition names, Life of an Edition).
  3. Binsearch for the latest edition in defaults that is earlier or equal to current.
    1. If the field is of singular, scalar type, use that value as the value of the field in feats.
    2. Otherwise, the value of the field in feats is given by merging all of the values less than current, starting from the oldest edition.
  4. For the purposes of this algorithm, Features‘s fields all behave as if they were required; failure to find a default explicitly via the editions default search mechanism should result in a compilation error, because it means the file’s edition is too old.
  5. For each extension of Features that is visible from foo.proto via imports, perform the same algorithm as above to construct the editions default for that extension message, and add it to feat.

This algorithm has the following properties:

  • Language-scoped features are discovered via imports, which is how they need to be imported for use in a file in the first place.
  • Every value is set explicitly, so we correctly reject too-old files.
  • Files from “the future” will not be rejected out of hand by the algorithm, allowing us to provide a flag like --allow-experimental-editions for ease of allowing backends to implement a new edition.

Edition Zero Features

Putting the parts together, we can offer a potential Feature message for edition zero: Edition Zero Features.

message Features {
  enum FieldPresence {
    EXPLICIT = 0;
    IMPLICIT = 1;
    LEGACY_REQUIRED = 2;
  }
  optional FieldPresence field_presence = 1 [
      retention = RUNTIME,
      target = FIELD,
      (edition_defaults) = {
        edition: "2023", default: "EXPLICIT"
      }
  ];

  enum EnumType {
    OPEN = 0;
    CLOSED = 1;
  }
  optional EnumType enum = 2 [
      retention = RUNTIME,
      target = ENUM,
      (edition_defaults) = {
        edition: "2023", default: "OPEN"
      }
  ];

  enum RepeatedFieldEncoding {
    PACKED = 0;
    EXPANDED = 1;
  }
  optional RepeatedFieldEncoding repeated_field_encoding = 3 [
      retention = RUNTIME,
      target = FIELD,
      (edition_defaults) = {
        edition: "2023", default: "PACKED"
      }
  ];

  enum StringFieldValidation {
    REQUIRED = 0;
    HINT = 1;
    SKIP = 2;
  }
  optional StringFieldValidation string_field_validation = 4 [
      retention = RUNTIME,
      target = FIELD,
      (edition_defaults) = {
        edition: "2023", default: "REQUIRED"
      }
  ];

  enum MessageEncoding {
    LENGTH_PREFIXED = 0;
    DELIMITED = 1;
  }
  optional MessageEncoding message_encoding = 5 [
      retention = RUNTIME,
      target = FIELD,
      (edition_defaults) = {
        edition: "2023", default: "LENGTH_PREFIXED"
      }
  ];

  extensions 1000;  // for features_cpp.proto
  extensions 1001;  // for features_java.proto
}