Added fuzzer for descriptor parsing/serialization, and fixed several bugs.

The initial motivation for this change was to fix a bug found by fuzzing.  The old fuzz test (built on `cc_fuzz_target()`) detected an infinite loop if a bytes field default has an unterminated `\x` escape.

To fix this bug while expanding fuzz coverage, I created a fuzz test that verifies that we can do a lossless round trip from descriptor -> DefPool -> descriptor.  We use C++ as the source of truth for whether a descriptor is valid or not, and what the canonical serialization back to protobuf form should be.

I wrote the new fuzz test using go/FuzzTest, which makes it easier and more readable to use an arbitrary `FileDescriptorSet` as input, while adding test cases for regressions.

The fuzz test highlighted a handful of errors that I subsequently fixed and added regression tests for:

1. The aforementioned unterminated `\x` bug.
2. We were not propagating the `edition` field.
3. We were missing the CheckIdent() check in a few places.
4. We were rejecting files with empty name, whereas C++ allows this.
5. There were a few bugs with escaping string defaults.

Since FuzzTest is Clang-only, I split the `FUZZ_TEST()` invocation from the regression tests, since the latter are portable and should be run on all platforms.  Only `FUZZ_TEST()` itself is in a google3/Clang-only file.

PiperOrigin-RevId: 506997362
19 files changed
tree: b992b35f2c10ec03def749a544b447a1edf3f1ac
  1. .bazelci/
  2. .github/
  3. bazel/
  4. benchmarks/
  5. cmake/
  6. docs/
  7. lua/
  8. protos/
  9. protos_generator/
  10. python/
  11. third_party/
  12. upb/
  13. upbc/
  14. .bazelignore
  15. .bazelrc
  16. .clang-format
  17. .gitignore
  18. BUILD
  19. CONTRIBUTING.md
  20. DESIGN.md
  21. LICENSE
  22. README.md
  23. WORKSPACE
README.md

μpb: small, fast C protos

μpb (often written ‘upb’) is a small protobuf implementation written in C.

upb is the core runtime for protobuf languages extensions in Ruby, PHP, and Python.

While upb offers a C API, the C API & ABI are not stable. For this reason, upb is not generally offered as a C library for direct consumption, and there are no releases.

Features

upb has comparable speed to protobuf C++, but is an order of magnitude smaller in code size.

Like the main protobuf implementation in C++, it supports:

  • a generated API (in C)
  • reflection
  • binary & JSON wire formats
  • text format serialization
  • all standard features of protobufs (oneofs, maps, unknown fields, extensions, etc.)
  • full conformance with the protobuf conformance tests

upb also supports some features that C++ does not:

  • optional reflection: generated messages are agnostic to whether reflection will be linked in or not.
  • no global state: no pre-main registration or other global state.
  • fast reflection-based parsing: messages loaded at runtime parse just as fast as compiled-in messages.

However there are a few features it does not support:

  • text format parsing
  • deep descriptor verification: upb's descriptor validation is not as exhaustive as protoc.

Install

For Ruby, use RubyGems:

$ gem install google-protobuf

For PHP, use PECL:

$ sudo pecl install protobuf

For Python, use PyPI:

$ sudo pip install protobuf

Alternatively, you can build and install upb using vcpkg dependency manager:

git clone https://github.com/Microsoft/vcpkg.git
cd vcpkg
./bootstrap-vcpkg.sh
./vcpkg integrate install
./vcpkg install upb

The upb port in vcpkg is kept up to date by microsoft team members and community contributors.

If the version is out of date, please create an issue or pull request on the vcpkg repository.

Contributing

Please see CONTRIBUTING.md.