Joshua Haberman | fda8544 | 2020-04-23 13:00:30 -0700 | [diff] [blame] | 1 | # How To Implement Field Presence for Proto3 |
| 2 | |
| 3 | Protobuf release 3.12 adds experimental support for `optional` fields in |
| 4 | proto3. Proto3 optional fields track presence like in proto2. For background |
| 5 | information about what presence tracking means, please see |
| 6 | [docs/field_presence](field_presence.md). |
| 7 | |
Joshua Haberman | 9952e36 | 2020-05-16 16:42:02 -0700 | [diff] [blame] | 8 | ## Document Summary |
| 9 | |
Joshua Haberman | fda8544 | 2020-04-23 13:00:30 -0700 | [diff] [blame] | 10 | This document is targeted at developers who own or maintain protobuf code |
| 11 | generators. All code generators will need to be updated to support proto3 |
| 12 | optional fields. First-party code generators developed by Google are being |
| 13 | updated already. However third-party code generators will need to be updated |
| 14 | independently by their authors. This includes: |
| 15 | |
Peter Newman | e2cc2de | 2020-08-10 19:08:25 +0100 | [diff] [blame] | 16 | - implementations of Protocol Buffers for other languages. |
Joshua Haberman | fda8544 | 2020-04-23 13:00:30 -0700 | [diff] [blame] | 17 | - alternate implementations of Protocol Buffers that target specialized use |
| 18 | cases. |
Joshua Haberman | 9952e36 | 2020-05-16 16:42:02 -0700 | [diff] [blame] | 19 | - RPC code generators that create generated APIs for service calls. |
Joshua Haberman | fda8544 | 2020-04-23 13:00:30 -0700 | [diff] [blame] | 20 | - code generators that implement some utility code on top of protobuf generated |
| 21 | classes. |
| 22 | |
| 23 | While this document speaks in terms of "code generators", these same principles |
| 24 | apply to implementations that dynamically generate a protocol buffer API "on the |
| 25 | fly", directly from a descriptor, in languages that support this kind of usage. |
| 26 | |
Joshua Haberman | 9952e36 | 2020-05-16 16:42:02 -0700 | [diff] [blame] | 27 | ## Background |
| 28 | |
| 29 | Presence tracking was added to proto3 in response to user feedback, both from |
| 30 | inside Google and [from open-source |
| 31 | users](https://github.com/protocolbuffers/protobuf/issues/1606). The [proto3 |
| 32 | wrapper |
noahdietz | 5abf802 | 2022-04-12 10:25:08 -0700 | [diff] [blame] | 33 | types](https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/wrappers.proto) |
Joshua Haberman | 9952e36 | 2020-05-16 16:42:02 -0700 | [diff] [blame] | 34 | were previously the only supported presence mechanism for proto3. Users have |
| 35 | pointed to both efficiency and usability issues with the wrapper types. |
| 36 | |
| 37 | Presence in proto3 uses exactly the same syntax and semantics as in proto2. |
| 38 | Proto3 Fields marked `optional` will track presence like proto2, while fields |
| 39 | without any label (known as "singular fields"), will continue to omit presence |
| 40 | information. The `optional` keyword was chosen to minimize differences with |
| 41 | proto2. |
| 42 | |
| 43 | Unfortunately, for the current descriptor protos and `Descriptor` API (as of |
| 44 | 3.11.4) it is not possible to use the same representation as proto2. Proto3 |
| 45 | descriptors already use `LABEL_OPTIONAL` for proto3 singular fields, which do |
| 46 | not track presence. There is a lot of existing code that reflects over proto3 |
| 47 | protos and assumes that `LABEL_OPTIONAL` in proto3 means "no presence." Changing |
| 48 | the semantics now would be risky, since old software would likely drop proto3 |
| 49 | presence information, which would be a data loss bug. |
| 50 | |
| 51 | To minimize this risk we chose a descriptor representation that is semantically |
| 52 | compatible with existing proto3 reflection. Every proto3 optional field is |
| 53 | placed into a one-field `oneof`. We call this a "synthetic" oneof, as it was not |
| 54 | present in the source `.proto` file. |
| 55 | |
| 56 | Since oneof fields in proto3 already track presence, existing proto3 |
| 57 | reflection-based algorithms should correctly preserve presence for proto3 |
| 58 | optional fields with no code changes. For example, the JSON and TextFormat |
| 59 | parsers/serializers in C++ and Java did not require any changes to support |
| 60 | proto3 presence. This is the major benefit of synthetic oneofs. |
| 61 | |
| 62 | This design does leave some cruft in descriptors. Synthetic oneofs are a |
| 63 | compatibility measure that we can hopefully clean up in the future. For now |
| 64 | though, it is important to preserve them across different descriptor formats and |
| 65 | APIs. It is never safe to drop synthetic oneofs from a proto schema. Code |
| 66 | generators can (and should) skip synthetic oneofs when generating a user-facing |
| 67 | API or user-facing documentation. But for any schema representation that is |
| 68 | consumed programmatically, it is important to keep the synthetic oneofs around. |
| 69 | |
| 70 | In APIs it can be helpful to offer separate accessors that refer to "real" |
| 71 | oneofs (see [API Changes](#api-changes) below). This is a convenient way to omit |
| 72 | synthetic oneofs in code generators. |
| 73 | |
Joshua Haberman | fda8544 | 2020-04-23 13:00:30 -0700 | [diff] [blame] | 74 | ## Updating a Code Generator |
| 75 | |
| 76 | When a user adds an `optional` field to proto3, this is internally rewritten as |
| 77 | a one-field oneof, for backward-compatibility with reflection-based algorithms: |
| 78 | |
| 79 | ```protobuf |
| 80 | syntax = "proto3"; |
| 81 | |
| 82 | message Foo { |
| 83 | // Experimental feature, not generally supported yet! |
| 84 | optional int32 foo = 1; |
| 85 | |
| 86 | // Internally rewritten to: |
| 87 | // oneof _foo { |
| 88 | // int32 foo = 1 [proto3_optional=true]; |
| 89 | // } |
| 90 | // |
| 91 | // We call _foo a "synthetic" oneof, since it was not created by the user. |
| 92 | } |
| 93 | ``` |
| 94 | |
| 95 | As a result, the main two goals when updating a code generator are: |
| 96 | |
| 97 | 1. Give `optional` fields like `foo` normal field presence, as described in |
| 98 | [docs/field_presence](field_presence.md) If your implementation already |
| 99 | supports proto2, a proto3 `optional` field should use exactly the same API |
| 100 | and internal implementation as proto2 `optional`. |
| 101 | 2. Avoid generating any oneof-based accessors for the synthetic oneof. Its only |
| 102 | purpose is to make reflection-based algorithms work properly if they are |
| 103 | not aware of proto3 presence. The synthetic oneof should not appear anywhere |
| 104 | in the generated API. |
| 105 | |
| 106 | ### Satisfying the Experimental Check |
| 107 | |
| 108 | If you try to run `protoc` on a file with proto3 `optional` fields, you will get |
| 109 | an error because the feature is still experimental: |
| 110 | |
| 111 | ``` |
| 112 | $ cat test.proto |
| 113 | syntax = "proto3"; |
| 114 | |
| 115 | message Foo { |
| 116 | // Experimental feature, not generally supported yet! |
| 117 | optional int32 a = 1; |
| 118 | } |
| 119 | $ protoc --cpp_out=. test.proto |
| 120 | test.proto: This file contains proto3 optional fields, but --experimental_allow_proto3_optional was not set. |
| 121 | ``` |
| 122 | |
| 123 | There are two options for getting around this error: |
| 124 | |
| 125 | 1. Pass `--experimental_allow_proto3_optional` to protoc. |
| 126 | 2. Make your filename (or a directory name) contain the string |
| 127 | `test_proto3_optional`. This indicates that the proto file is specifically |
| 128 | for testing proto3 optional support, so the check is suppressed. |
| 129 | |
| 130 | These options are demonstrated below: |
| 131 | |
| 132 | ``` |
| 133 | # One option: |
Mike Kruskal | ed5c57a | 2022-08-10 22:51:29 -0700 | [diff] [blame] | 134 | $ protoc test.proto --cpp_out=. --experimental_allow_proto3_optional |
Joshua Haberman | fda8544 | 2020-04-23 13:00:30 -0700 | [diff] [blame] | 135 | |
| 136 | # Another option: |
| 137 | $ cp test.proto test_proto3_optional.proto |
Mike Kruskal | ed5c57a | 2022-08-10 22:51:29 -0700 | [diff] [blame] | 138 | $ protoc test_proto3_optional.proto --cpp_out=. |
Joshua Haberman | fda8544 | 2020-04-23 13:00:30 -0700 | [diff] [blame] | 139 | $ |
| 140 | ``` |
| 141 | |
Joshua Haberman | 7eddac7 | 2020-04-23 14:33:53 -0700 | [diff] [blame] | 142 | The experimental check will be removed in a future release, once we are ready |
| 143 | to make this feature generally available. Ideally this will happen for the 3.13 |
| 144 | release of protobuf, sometime in mid-2020, but there is not a specific date set |
| 145 | for this yet. Some of the timing will depend on feedback we get from the |
| 146 | community, so if you have questions or concerns please get in touch via a |
| 147 | GitHub issue. |
| 148 | |
Joshua Haberman | fda8544 | 2020-04-23 13:00:30 -0700 | [diff] [blame] | 149 | ### Signaling That Your Code Generator Supports Proto3 Optional |
| 150 | |
| 151 | If you now try to invoke your own code generator with the test proto, you will |
| 152 | run into a different error: |
| 153 | |
| 154 | ``` |
Mike Kruskal | ed5c57a | 2022-08-10 22:51:29 -0700 | [diff] [blame] | 155 | $ protoc test_proto3_optional.proto --my_codegen_out=. |
Joshua Haberman | fda8544 | 2020-04-23 13:00:30 -0700 | [diff] [blame] | 156 | test_proto3_optional.proto: is a proto3 file that contains optional fields, but |
| 157 | code generator --my_codegen_out hasn't been updated to support optional fields in |
| 158 | proto3. Please ask the owner of this code generator to support proto3 optional. |
| 159 | ``` |
| 160 | |
| 161 | This check exists to make sure that code generators get a chance to update |
| 162 | before they are used with proto3 `optional` fields. Without this check an old |
| 163 | code generator might emit obsolete generated APIs (like accessors for a |
| 164 | synthetic oneof) and users could start depending on these. That would create |
| 165 | a legacy migration burden once a code generator actually implements the feature. |
| 166 | |
| 167 | To signal that your code generator supports `optional` fields in proto3, you |
| 168 | need to tell `protoc` what features you support. The method for doing this |
| 169 | depends on whether you are using the C++ |
| 170 | `google::protobuf::compiler::CodeGenerator` |
| 171 | framework or not. |
| 172 | |
| 173 | If you are using the CodeGenerator framework: |
| 174 | |
| 175 | ```c++ |
| 176 | class MyCodeGenerator : public google::protobuf::compiler::CodeGenerator { |
| 177 | // Add this method. |
| 178 | uint64_t GetSupportedFeatures() const override { |
| 179 | // Indicate that this code generator supports proto3 optional fields. |
| 180 | // (Note: don't release your code generator with this flag set until you |
| 181 | // have actually added and tested your proto3 support!) |
| 182 | return FEATURE_PROTO3_OPTIONAL; |
| 183 | } |
| 184 | } |
| 185 | ``` |
| 186 | |
| 187 | If you are generating code using raw `CodeGeneratorRequest` and |
| 188 | `CodeGeneratorResponse` messages from `plugin.proto`, the change will be very |
| 189 | similar: |
| 190 | |
| 191 | ```c++ |
| 192 | void GenerateResponse() { |
| 193 | CodeGeneratorResponse response; |
| 194 | response.set_supported_features(CodeGeneratorResponse::FEATURE_PROTO3_OPTIONAL); |
| 195 | |
| 196 | // Generate code... |
| 197 | } |
| 198 | ``` |
| 199 | |
| 200 | Once you have added this, you should now be able to successfully use your code |
| 201 | generator to generate a file containing proto3 optional fields: |
| 202 | |
| 203 | ``` |
Mike Kruskal | ed5c57a | 2022-08-10 22:51:29 -0700 | [diff] [blame] | 204 | $ protoc test_proto3_optional.proto --my_codegen_out=. |
Joshua Haberman | fda8544 | 2020-04-23 13:00:30 -0700 | [diff] [blame] | 205 | ``` |
| 206 | |
| 207 | ### Updating Your Code Generator |
| 208 | |
| 209 | Now to actually add support for proto3 optional to your code generator. The goal |
| 210 | is to recognize proto3 optional fields as optional, and suppress any output from |
| 211 | synthetic oneofs. |
| 212 | |
| 213 | If your code generator does not currently support proto2, you will need to |
| 214 | design an API and implementation for supporting presence in scalar fields. |
| 215 | Generally this means: |
| 216 | |
| 217 | - allocating a bit inside the generated class to represent whether a given field |
| 218 | is present or not. |
| 219 | - exposing a `has_foo()` method for each field to return the value of this bit. |
| 220 | - make the parser set this bit when a value is parsed from the wire. |
| 221 | - make the serializer test this bit to decide whether to serialize. |
| 222 | |
| 223 | If your code generator already supports proto2, then most of your work is |
| 224 | already done. All you need to do is make sure that proto3 optional fields have |
| 225 | exactly the same API and behave in exactly the same way as proto2 optional |
| 226 | fields. |
| 227 | |
| 228 | From experience updating several of Google's code generators, most of the |
| 229 | updates that are required fall into one of several patterns. Here we will show |
| 230 | the patterns in terms of the C++ CodeGenerator framework. If you are using |
| 231 | `CodeGeneratorRequest` and `CodeGeneratorReply` directly, you can translate the |
| 232 | C++ examples to your own language, referencing the C++ implementation of these |
| 233 | methods where required. |
| 234 | |
| 235 | #### To test whether a field should have presence |
| 236 | |
| 237 | Old: |
| 238 | |
| 239 | ```c++ |
| 240 | bool MessageHasPresence(const google::protobuf::Descriptor* message) { |
| 241 | return message->file()->syntax() == |
| 242 | google::protobuf::FileDescriptor::SYNTAX_PROTO2; |
| 243 | } |
| 244 | ``` |
| 245 | |
| 246 | New: |
| 247 | |
| 248 | ```c++ |
| 249 | // Presence is no longer a property of a message, it's a property of individual |
| 250 | // fields. |
| 251 | bool FieldHasPresence(const google::protobuf::FieldDescriptor* field) { |
| 252 | return field->has_presence(); |
| 253 | // Note, the above will return true for fields in a oneof. |
| 254 | // If you want to filter out oneof fields, write this instead: |
| 255 | // return field->has_presence && !field->real_containing_oneof() |
| 256 | } |
| 257 | ``` |
| 258 | |
| 259 | #### To test whether a field is a member of a oneof |
| 260 | |
| 261 | Old: |
| 262 | |
| 263 | ```c++ |
Jiro Nishiguchi | 8d1d530 | 2021-03-08 15:40:03 +0900 | [diff] [blame] | 264 | bool FieldIsInOneof(const google::protobuf::FieldDescriptor* field) { |
Joshua Haberman | fda8544 | 2020-04-23 13:00:30 -0700 | [diff] [blame] | 265 | return field->containing_oneof() != nullptr; |
| 266 | } |
| 267 | ``` |
| 268 | |
| 269 | New: |
| 270 | |
| 271 | ```c++ |
Jiro Nishiguchi | 8d1d530 | 2021-03-08 15:40:03 +0900 | [diff] [blame] | 272 | bool FieldIsInOneof(const google::protobuf::FieldDescriptor* field) { |
Joshua Haberman | fda8544 | 2020-04-23 13:00:30 -0700 | [diff] [blame] | 273 | // real_containing_oneof() returns nullptr for synthetic oneofs. |
| 274 | return field->real_containing_oneof() != nullptr; |
| 275 | } |
| 276 | ``` |
| 277 | |
| 278 | #### To iterate over all oneofs |
| 279 | |
| 280 | Old: |
| 281 | |
| 282 | ```c++ |
| 283 | bool IterateOverOneofs(const google::protobuf::Descriptor* message) { |
| 284 | for (int i = 0; i < message->oneof_decl_count(); i++) { |
| 285 | const google::protobuf::OneofDescriptor* oneof = message->oneof(i); |
| 286 | // ... |
| 287 | } |
| 288 | } |
| 289 | ``` |
| 290 | |
| 291 | New: |
| 292 | |
| 293 | ```c++ |
| 294 | bool IterateOverOneofs(const google::protobuf::Descriptor* message) { |
| 295 | // Real oneofs are always first, and real_oneof_decl_count() will return the |
| 296 | // total number of oneofs, excluding synthetic oneofs. |
| 297 | for (int i = 0; i < message->real_oneof_decl_count(); i++) { |
| 298 | const google::protobuf::OneofDescriptor* oneof = message->oneof(i); |
| 299 | // ... |
| 300 | } |
| 301 | } |
| 302 | ``` |
| 303 | |
| 304 | ## Updating Reflection |
| 305 | |
Joshua Haberman | 7eddac7 | 2020-04-23 14:33:53 -0700 | [diff] [blame] | 306 | If your implementation offers reflection, there are a few other changes to make: |
| 307 | |
| 308 | ### API Changes |
| 309 | |
| 310 | The API for reflecting over fields and oneofs should make the following changes. |
| 311 | These match the changes implemented in C++ reflection. |
| 312 | |
| 313 | 1. Add a `FieldDescriptor::has_presence()` method returning `bool` |
| 314 | (adjusted to your language's naming convention). This should return true |
| 315 | for all fields that have explicit presence, as documented in |
| 316 | [docs/field_presence](field_presence.md). In particular, this includes |
| 317 | fields in a oneof, proto2 scalar fields, and proto3 `optional` fields. |
| 318 | This accessor will allow users to query what fields have presence without |
| 319 | thinking about the difference between proto2 and proto3. |
| 320 | 2. As a corollary of (1), please do *not* expose an accessor for the |
| 321 | `FieldDescriptorProto.proto3_optional` field. We want to avoid having |
| 322 | users implement any proto2/proto3-specific logic. Users should use the |
| 323 | `has_presence()` function instead. |
| 324 | 3. You may also wish to add a `FieldDescriptor::has_optional_keyword()` method |
| 325 | returning `bool`, which indicates whether the `optional` keyword is present. |
| 326 | Message fields will always return `true` for `has_presence()`, so this method |
| 327 | can allow a user to know whether the user wrote `optional` or not. It can |
| 328 | occasionally be useful to have this information, even though it does not |
| 329 | change the presence semantics of the field. |
| 330 | 4. If your reflection API may be used for a code generator, you may wish to |
| 331 | implement methods to help users tell the difference between real and |
| 332 | synthetic oneofs. In particular: |
| 333 | - `OneofDescriptor::is_synthetic()`: returns true if this is a synthetic |
| 334 | oneof. |
| 335 | - `FieldDescriptor::real_containing_oneof()`: like `containing_oneof()`, |
| 336 | but returns `nullptr` if the oneof is synthetic. |
| 337 | - `Descriptor::real_oneof_decl_count()`: like `oneof_decl_count()`, but |
| 338 | returns the number of real oneofs only. |
| 339 | |
| 340 | ### Implementation Changes |
| 341 | |
| 342 | Proto3 `optional` fields and synthetic oneofs must work correctly when |
| 343 | reflected on. Specifically: |
Joshua Haberman | fda8544 | 2020-04-23 13:00:30 -0700 | [diff] [blame] | 344 | |
| 345 | 1. Reflection for synthetic oneofs should work properly. Even though synthetic |
| 346 | oneofs do not really exist in the message, you can still make reflection work |
| 347 | as if they did. In particular, you can make a method like |
| 348 | `Reflection::HasOneof()` or `Reflection::GetOneofFieldDescriptor()` look at |
| 349 | the hasbit to determine if the oneof is present or not. |
| 350 | 2. Reflection for proto3 optional fields should work properly. For example, a |
| 351 | method like `Reflection::HasField()` should know to look for the hasbit for a |
| 352 | proto3 `optional` field. It should not be fooled by the synthetic oneof into |
| 353 | thinking that there is a `case` member for the oneof. |
| 354 | |
| 355 | Once you have updated reflection to work properly with proto3 `optional` and |
| 356 | synthetic oneofs, any code that *uses* your reflection interface should work |
| 357 | properly with no changes. This is the benefit of using synthetic oneofs. |
| 358 | |
| 359 | In particular, if you have a reflection-based implementation of protobuf text |
| 360 | format or JSON, it should properly support proto3 optional fields without any |
| 361 | changes to the code. The fields will look like they all belong to a one-field |
| 362 | oneof, and existing proto3 reflection code should know how to test presence for |
| 363 | fields in a oneof. |
| 364 | |
| 365 | So the best way to test your reflection changes is to try round-tripping a |
| 366 | message through text format, JSON, or some other reflection-based parser and |
| 367 | serializer, if you have one. |
Joshua Haberman | 7eddac7 | 2020-04-23 14:33:53 -0700 | [diff] [blame] | 368 | |
| 369 | ### Validating Descriptors |
| 370 | |
| 371 | If your reflection implementation supports loading descriptors at runtime, |
| 372 | you must verify that all synthetic oneofs are ordered after all "real" oneofs. |
| 373 | |
| 374 | Here is the code that implements this validation step in C++, for inspiration: |
| 375 | |
| 376 | ```c++ |
| 377 | // Validation that runs for each message. |
| 378 | // Synthetic oneofs must be last. |
| 379 | int first_synthetic = -1; |
| 380 | for (int i = 0; i < message->oneof_decl_count(); i++) { |
| 381 | const OneofDescriptor* oneof = message->oneof_decl(i); |
| 382 | if (oneof->is_synthetic()) { |
| 383 | if (first_synthetic == -1) { |
| 384 | first_synthetic = i; |
| 385 | } |
| 386 | } else { |
| 387 | if (first_synthetic != -1) { |
| 388 | AddError(message->full_name(), proto.oneof_decl(i), |
| 389 | DescriptorPool::ErrorCollector::OTHER, |
| 390 | "Synthetic oneofs must be after all other oneofs"); |
| 391 | } |
| 392 | } |
| 393 | } |
| 394 | |
| 395 | if (first_synthetic == -1) { |
| 396 | message->real_oneof_decl_count_ = message->oneof_decl_count_; |
| 397 | } else { |
| 398 | message->real_oneof_decl_count_ = first_synthetic; |
| 399 | } |
| 400 | ``` |