blob: 3f09565439fb16895c2b410ec95f929e7155d6cd [file] [log] [blame] [view]
Joshua Habermanfda85442020-04-23 13:00:30 -07001# How To Implement Field Presence for Proto3
2
3Protobuf release 3.12 adds experimental support for `optional` fields in
4proto3. Proto3 optional fields track presence like in proto2. For background
5information about what presence tracking means, please see
6[docs/field_presence](field_presence.md).
7
Joshua Haberman9952e362020-05-16 16:42:02 -07008## Document Summary
9
Joshua Habermanfda85442020-04-23 13:00:30 -070010This document is targeted at developers who own or maintain protobuf code
11generators. All code generators will need to be updated to support proto3
12optional fields. First-party code generators developed by Google are being
13updated already. However third-party code generators will need to be updated
14independently by their authors. This includes:
15
Peter Newmane2cc2de2020-08-10 19:08:25 +010016- implementations of Protocol Buffers for other languages.
Joshua Habermanfda85442020-04-23 13:00:30 -070017- alternate implementations of Protocol Buffers that target specialized use
18 cases.
Joshua Haberman9952e362020-05-16 16:42:02 -070019- RPC code generators that create generated APIs for service calls.
Joshua Habermanfda85442020-04-23 13:00:30 -070020- code generators that implement some utility code on top of protobuf generated
21 classes.
22
23While this document speaks in terms of "code generators", these same principles
24apply to implementations that dynamically generate a protocol buffer API "on the
25fly", directly from a descriptor, in languages that support this kind of usage.
26
Joshua Haberman9952e362020-05-16 16:42:02 -070027## Background
28
29Presence tracking was added to proto3 in response to user feedback, both from
30inside Google and [from open-source
31users](https://github.com/protocolbuffers/protobuf/issues/1606). The [proto3
32wrapper
noahdietz5abf8022022-04-12 10:25:08 -070033types](https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/wrappers.proto)
Joshua Haberman9952e362020-05-16 16:42:02 -070034were previously the only supported presence mechanism for proto3. Users have
35pointed to both efficiency and usability issues with the wrapper types.
36
37Presence in proto3 uses exactly the same syntax and semantics as in proto2.
38Proto3 Fields marked `optional` will track presence like proto2, while fields
39without any label (known as "singular fields"), will continue to omit presence
40information. The `optional` keyword was chosen to minimize differences with
41proto2.
42
43Unfortunately, for the current descriptor protos and `Descriptor` API (as of
443.11.4) it is not possible to use the same representation as proto2. Proto3
45descriptors already use `LABEL_OPTIONAL` for proto3 singular fields, which do
46not track presence. There is a lot of existing code that reflects over proto3
47protos and assumes that `LABEL_OPTIONAL` in proto3 means "no presence." Changing
48the semantics now would be risky, since old software would likely drop proto3
49presence information, which would be a data loss bug.
50
51To minimize this risk we chose a descriptor representation that is semantically
52compatible with existing proto3 reflection. Every proto3 optional field is
53placed into a one-field `oneof`. We call this a "synthetic" oneof, as it was not
54present in the source `.proto` file.
55
56Since oneof fields in proto3 already track presence, existing proto3
57reflection-based algorithms should correctly preserve presence for proto3
58optional fields with no code changes. For example, the JSON and TextFormat
59parsers/serializers in C++ and Java did not require any changes to support
60proto3 presence. This is the major benefit of synthetic oneofs.
61
62This design does leave some cruft in descriptors. Synthetic oneofs are a
63compatibility measure that we can hopefully clean up in the future. For now
64though, it is important to preserve them across different descriptor formats and
65APIs. It is never safe to drop synthetic oneofs from a proto schema. Code
66generators can (and should) skip synthetic oneofs when generating a user-facing
67API or user-facing documentation. But for any schema representation that is
68consumed programmatically, it is important to keep the synthetic oneofs around.
69
70In APIs it can be helpful to offer separate accessors that refer to "real"
71oneofs (see [API Changes](#api-changes) below). This is a convenient way to omit
72synthetic oneofs in code generators.
73
Joshua Habermanfda85442020-04-23 13:00:30 -070074## Updating a Code Generator
75
76When a user adds an `optional` field to proto3, this is internally rewritten as
77a one-field oneof, for backward-compatibility with reflection-based algorithms:
78
79```protobuf
80syntax = "proto3";
81
82message Foo {
83 // Experimental feature, not generally supported yet!
84 optional int32 foo = 1;
85
86 // Internally rewritten to:
87 // oneof _foo {
88 // int32 foo = 1 [proto3_optional=true];
89 // }
90 //
91 // We call _foo a "synthetic" oneof, since it was not created by the user.
92}
93```
94
95As a result, the main two goals when updating a code generator are:
96
971. Give `optional` fields like `foo` normal field presence, as described in
98 [docs/field_presence](field_presence.md) If your implementation already
99 supports proto2, a proto3 `optional` field should use exactly the same API
100 and internal implementation as proto2 `optional`.
1012. Avoid generating any oneof-based accessors for the synthetic oneof. Its only
102 purpose is to make reflection-based algorithms work properly if they are
103 not aware of proto3 presence. The synthetic oneof should not appear anywhere
104 in the generated API.
105
106### Satisfying the Experimental Check
107
108If you try to run `protoc` on a file with proto3 `optional` fields, you will get
109an error because the feature is still experimental:
110
111```
112$ cat test.proto
113syntax = "proto3";
114
115message Foo {
116 // Experimental feature, not generally supported yet!
117 optional int32 a = 1;
118}
119$ protoc --cpp_out=. test.proto
120test.proto: This file contains proto3 optional fields, but --experimental_allow_proto3_optional was not set.
121```
122
123There are two options for getting around this error:
124
1251. Pass `--experimental_allow_proto3_optional` to protoc.
1262. Make your filename (or a directory name) contain the string
127 `test_proto3_optional`. This indicates that the proto file is specifically
128 for testing proto3 optional support, so the check is suppressed.
129
130These options are demonstrated below:
131
132```
133# One option:
Mike Kruskaled5c57a2022-08-10 22:51:29 -0700134$ protoc test.proto --cpp_out=. --experimental_allow_proto3_optional
Joshua Habermanfda85442020-04-23 13:00:30 -0700135
136# Another option:
137$ cp test.proto test_proto3_optional.proto
Mike Kruskaled5c57a2022-08-10 22:51:29 -0700138$ protoc test_proto3_optional.proto --cpp_out=.
Joshua Habermanfda85442020-04-23 13:00:30 -0700139$
140```
141
Joshua Haberman7eddac72020-04-23 14:33:53 -0700142The experimental check will be removed in a future release, once we are ready
143to make this feature generally available. Ideally this will happen for the 3.13
144release of protobuf, sometime in mid-2020, but there is not a specific date set
145for this yet. Some of the timing will depend on feedback we get from the
146community, so if you have questions or concerns please get in touch via a
147GitHub issue.
148
Joshua Habermanfda85442020-04-23 13:00:30 -0700149### Signaling That Your Code Generator Supports Proto3 Optional
150
151If you now try to invoke your own code generator with the test proto, you will
152run into a different error:
153
154```
Mike Kruskaled5c57a2022-08-10 22:51:29 -0700155$ protoc test_proto3_optional.proto --my_codegen_out=.
Joshua Habermanfda85442020-04-23 13:00:30 -0700156test_proto3_optional.proto: is a proto3 file that contains optional fields, but
157code generator --my_codegen_out hasn't been updated to support optional fields in
158proto3. Please ask the owner of this code generator to support proto3 optional.
159```
160
161This check exists to make sure that code generators get a chance to update
162before they are used with proto3 `optional` fields. Without this check an old
163code generator might emit obsolete generated APIs (like accessors for a
164synthetic oneof) and users could start depending on these. That would create
165a legacy migration burden once a code generator actually implements the feature.
166
167To signal that your code generator supports `optional` fields in proto3, you
168need to tell `protoc` what features you support. The method for doing this
169depends on whether you are using the C++
170`google::protobuf::compiler::CodeGenerator`
171framework or not.
172
173If you are using the CodeGenerator framework:
174
175```c++
176class MyCodeGenerator : public google::protobuf::compiler::CodeGenerator {
177 // Add this method.
178 uint64_t GetSupportedFeatures() const override {
179 // Indicate that this code generator supports proto3 optional fields.
180 // (Note: don't release your code generator with this flag set until you
181 // have actually added and tested your proto3 support!)
182 return FEATURE_PROTO3_OPTIONAL;
183 }
184}
185```
186
187If you are generating code using raw `CodeGeneratorRequest` and
188`CodeGeneratorResponse` messages from `plugin.proto`, the change will be very
189similar:
190
191```c++
192void GenerateResponse() {
193 CodeGeneratorResponse response;
194 response.set_supported_features(CodeGeneratorResponse::FEATURE_PROTO3_OPTIONAL);
195
196 // Generate code...
197}
198```
199
200Once you have added this, you should now be able to successfully use your code
201generator to generate a file containing proto3 optional fields:
202
203```
Mike Kruskaled5c57a2022-08-10 22:51:29 -0700204$ protoc test_proto3_optional.proto --my_codegen_out=.
Joshua Habermanfda85442020-04-23 13:00:30 -0700205```
206
207### Updating Your Code Generator
208
209Now to actually add support for proto3 optional to your code generator. The goal
210is to recognize proto3 optional fields as optional, and suppress any output from
211synthetic oneofs.
212
213If your code generator does not currently support proto2, you will need to
214design an API and implementation for supporting presence in scalar fields.
215Generally this means:
216
217- allocating a bit inside the generated class to represent whether a given field
218 is present or not.
219- exposing a `has_foo()` method for each field to return the value of this bit.
220- make the parser set this bit when a value is parsed from the wire.
221- make the serializer test this bit to decide whether to serialize.
222
223If your code generator already supports proto2, then most of your work is
224already done. All you need to do is make sure that proto3 optional fields have
225exactly the same API and behave in exactly the same way as proto2 optional
226fields.
227
228From experience updating several of Google's code generators, most of the
229updates that are required fall into one of several patterns. Here we will show
230the patterns in terms of the C++ CodeGenerator framework. If you are using
231`CodeGeneratorRequest` and `CodeGeneratorReply` directly, you can translate the
232C++ examples to your own language, referencing the C++ implementation of these
233methods where required.
234
235#### To test whether a field should have presence
236
237Old:
238
239```c++
Mike Kruskal57e4b642023-03-13 07:44:01 -0700240bool MessageHasPresence(const google::protobuf::FieldDescriptor* field) {
241 return field->has_presence();
Joshua Habermanfda85442020-04-23 13:00:30 -0700242}
243```
244
245New:
246
247```c++
248// Presence is no longer a property of a message, it's a property of individual
249// fields.
250bool FieldHasPresence(const google::protobuf::FieldDescriptor* field) {
251 return field->has_presence();
252 // Note, the above will return true for fields in a oneof.
253 // If you want to filter out oneof fields, write this instead:
254 // return field->has_presence && !field->real_containing_oneof()
255}
256```
257
258#### To test whether a field is a member of a oneof
259
260Old:
261
262```c++
Jiro Nishiguchi8d1d5302021-03-08 15:40:03 +0900263bool FieldIsInOneof(const google::protobuf::FieldDescriptor* field) {
Joshua Habermanfda85442020-04-23 13:00:30 -0700264 return field->containing_oneof() != nullptr;
265}
266```
267
268New:
269
270```c++
Jiro Nishiguchi8d1d5302021-03-08 15:40:03 +0900271bool FieldIsInOneof(const google::protobuf::FieldDescriptor* field) {
Joshua Habermanfda85442020-04-23 13:00:30 -0700272 // real_containing_oneof() returns nullptr for synthetic oneofs.
273 return field->real_containing_oneof() != nullptr;
274}
275```
276
277#### To iterate over all oneofs
278
279Old:
280
281```c++
282bool IterateOverOneofs(const google::protobuf::Descriptor* message) {
283 for (int i = 0; i < message->oneof_decl_count(); i++) {
284 const google::protobuf::OneofDescriptor* oneof = message->oneof(i);
285 // ...
286 }
287}
288```
289
290New:
291
292```c++
293bool IterateOverOneofs(const google::protobuf::Descriptor* message) {
294 // Real oneofs are always first, and real_oneof_decl_count() will return the
295 // total number of oneofs, excluding synthetic oneofs.
296 for (int i = 0; i < message->real_oneof_decl_count(); i++) {
297 const google::protobuf::OneofDescriptor* oneof = message->oneof(i);
298 // ...
299 }
300}
301```
302
303## Updating Reflection
304
Joshua Haberman7eddac72020-04-23 14:33:53 -0700305If your implementation offers reflection, there are a few other changes to make:
306
307### API Changes
308
309The API for reflecting over fields and oneofs should make the following changes.
310These match the changes implemented in C++ reflection.
311
3121. Add a `FieldDescriptor::has_presence()` method returning `bool`
313 (adjusted to your language's naming convention). This should return true
314 for all fields that have explicit presence, as documented in
315 [docs/field_presence](field_presence.md). In particular, this includes
316 fields in a oneof, proto2 scalar fields, and proto3 `optional` fields.
317 This accessor will allow users to query what fields have presence without
318 thinking about the difference between proto2 and proto3.
3192. As a corollary of (1), please do *not* expose an accessor for the
320 `FieldDescriptorProto.proto3_optional` field. We want to avoid having
321 users implement any proto2/proto3-specific logic. Users should use the
322 `has_presence()` function instead.
3233. You may also wish to add a `FieldDescriptor::has_optional_keyword()` method
324 returning `bool`, which indicates whether the `optional` keyword is present.
325 Message fields will always return `true` for `has_presence()`, so this method
326 can allow a user to know whether the user wrote `optional` or not. It can
327 occasionally be useful to have this information, even though it does not
328 change the presence semantics of the field.
3294. If your reflection API may be used for a code generator, you may wish to
330 implement methods to help users tell the difference between real and
331 synthetic oneofs. In particular:
332 - `OneofDescriptor::is_synthetic()`: returns true if this is a synthetic
333 oneof.
334 - `FieldDescriptor::real_containing_oneof()`: like `containing_oneof()`,
335 but returns `nullptr` if the oneof is synthetic.
336 - `Descriptor::real_oneof_decl_count()`: like `oneof_decl_count()`, but
337 returns the number of real oneofs only.
338
339### Implementation Changes
340
341Proto3 `optional` fields and synthetic oneofs must work correctly when
342reflected on. Specifically:
Joshua Habermanfda85442020-04-23 13:00:30 -0700343
3441. Reflection for synthetic oneofs should work properly. Even though synthetic
345 oneofs do not really exist in the message, you can still make reflection work
346 as if they did. In particular, you can make a method like
347 `Reflection::HasOneof()` or `Reflection::GetOneofFieldDescriptor()` look at
348 the hasbit to determine if the oneof is present or not.
3492. Reflection for proto3 optional fields should work properly. For example, a
350 method like `Reflection::HasField()` should know to look for the hasbit for a
351 proto3 `optional` field. It should not be fooled by the synthetic oneof into
352 thinking that there is a `case` member for the oneof.
353
354Once you have updated reflection to work properly with proto3 `optional` and
355synthetic oneofs, any code that *uses* your reflection interface should work
356properly with no changes. This is the benefit of using synthetic oneofs.
357
358In particular, if you have a reflection-based implementation of protobuf text
359format or JSON, it should properly support proto3 optional fields without any
360changes to the code. The fields will look like they all belong to a one-field
361oneof, and existing proto3 reflection code should know how to test presence for
362fields in a oneof.
363
364So the best way to test your reflection changes is to try round-tripping a
365message through text format, JSON, or some other reflection-based parser and
366serializer, if you have one.
Joshua Haberman7eddac72020-04-23 14:33:53 -0700367
368### Validating Descriptors
369
370If your reflection implementation supports loading descriptors at runtime,
371you must verify that all synthetic oneofs are ordered after all "real" oneofs.
372
373Here is the code that implements this validation step in C++, for inspiration:
374
375```c++
376 // Validation that runs for each message.
377 // Synthetic oneofs must be last.
378 int first_synthetic = -1;
379 for (int i = 0; i < message->oneof_decl_count(); i++) {
380 const OneofDescriptor* oneof = message->oneof_decl(i);
381 if (oneof->is_synthetic()) {
382 if (first_synthetic == -1) {
383 first_synthetic = i;
384 }
385 } else {
386 if (first_synthetic != -1) {
387 AddError(message->full_name(), proto.oneof_decl(i),
388 DescriptorPool::ErrorCollector::OTHER,
389 "Synthetic oneofs must be after all other oneofs");
390 }
391 }
392 }
393
394 if (first_synthetic == -1) {
395 message->real_oneof_decl_count_ = message->oneof_decl_count_;
396 } else {
397 message->real_oneof_decl_count_ = first_synthetic;
398 }
399```