upb/json: fix sign-extended index in jsondec_base64_tablelookup (#27215)
## Problem
`jsondec_base64_tablelookup()` in `upb/json/decode.c` indexes a 256-byte `signed char` table with `table[(unsigned)ch]`. Because C integer promotion of a `signed char` runs *before* the cast, any input byte with the high bit set (`0x80..0xFF`) is sign-extended to a negative `int` and then reinterpreted as a huge unsigned value (`0xFFFFFF80..0xFFFFFFFF`). The 256-byte table is then read approximately 4 GiB past its base, producing an out-of-bounds read.
The same pattern exists in the amalgamated copies that the Ruby and PHP extensions ship:
- `ruby/ext/google/protobuf_c/ruby-upb.c`
- `php/ext/google/protobuf/php-upb.c`
## Fix
Cast through `unsigned char` so the byte is zero-extended to `[0x80, 0xFF]` before being used as a table index. One-character change in three files.
```diff
- return table[(unsigned)ch];
+ return table[(unsigned char)ch];
```
## Compatibility
- `ch` in `[0x00, 0x7F]`: `(unsigned)ch` and `(unsigned char)ch` produce identical values — no behavior change.
- `ch` in `[0x80, 0xFF]`: previously OOB read. The fix returns the table's `-1` sentinel, which `jsondec_base64()` already handles as "invalid base64 char" via `if (val < 0)`.
No public API changes, no new allocations, no new branches.
## Test plan
- Adds `optional bytes data = 11;` to `upb_test.Box` in `upb/json/test.proto` so a `bytes`-typed field is reachable from the existing `JsonDecode` helper in `decode_test.cc`.
- Adds `TEST(JsonTest, RejectsBase64WithHighBitBytes)` to `upb/json/decode_test.cc`, which decodes `{"data":"����"}` and verifies the decoder fails gracefully (no crash, returns nullptr). On the unfixed code under ASan this test exhibits the OOB read.
- Existing `upb/json/decode_test.cc` cases continue to pass.
- **Verified locally on `f331eba78` with `bazel test //upb/json:decode_test`**: with the fix all 5 tests pass; with the fix reverted (test kept), the new test fails with **SIGBUS** in `jsondec_base64_tablelookup` while the other 4 still pass — confirming the test exercises the exact code path the fix repairs.
## Files changed
| File | Change |
|---|---|
| `upb/json/decode.c` | One-character cast fix |
| `ruby/ext/google/protobuf_c/ruby-upb.c` | Same fix in amalgamated copy |
| `php/ext/google/protobuf/php-upb.c` | Same fix in amalgamated copy |
| `upb/json/test.proto` | `+optional bytes data = 11;` (test-only) |
| `upb/json/decode_test.cc` | Regression test |
If the project regenerates `ruby-upb.c` / `php-upb.c` from `upb/json/decode.c` automatically, please let me know and I will drop those two files from the PR.
## Reference
Reported via Google Bug Hunters / OSS VRP.
Closes #27215
COPYBARA_INTEGRATE_REVIEW=https://github.com/protocolbuffers/protobuf/pull/27215 from sukhoon0975:fix/upb-json-base64-sign-extend 18d34d7b070d2a4c3e2b80b18aba43d8132eddbc
PiperOrigin-RevId: 915053186
Copyright 2008 Google LLC
Protocol Buffers (a.k.a., protobuf) are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data. You can learn more about it in protobuf's documentation.
This README file contains protobuf installation instructions. To install protobuf, you need to install the protocol compiler (used to compile .proto files) and the protobuf runtime for your chosen programming language.
Most users will find working from supported releases to be the easiest path.
If you choose to work from the head revision of the main branch your build will occasionally be broken by source-incompatible changes and insufficiently-tested (and therefore broken) behavior.
If you are using C++ or otherwise need to build protobuf from source as a part of your project, you should pin to a release commit on a release branch.
This is because even release branches can experience some instability in between release commits.
Protobuf supports Bzlmod with Bazel 8 +. Users should specify a dependency on protobuf in their MODULE.bazel file as follows.
bazel_dep(name = "protobuf", version = <VERSION>)
Users can optionally override the repo name, such as for compatibility with WORKSPACE.
bazel_dep(name = "protobuf", version = <VERSION>, repo_name = "com_google_protobuf")
Users can also add the following to their legacy WORKSPACE file.
Note that with the release of 30.x there are a few more load statements to properly set up rules_java and rules_python.
http_archive(
name = "com_google_protobuf",
strip_prefix = "protobuf-VERSION",
sha256 = ...,
url = ...,
)
load("@com_google_protobuf//:protobuf_deps.bzl", "protobuf_deps")
protobuf_deps()
load("@rules_java//java:rules_java_deps.bzl", "rules_java_dependencies")
rules_java_dependencies()
load("@rules_java//java:repositories.bzl", "rules_java_toolchains")
rules_java_toolchains()
load("@rules_python//python:repositories.bzl", "py_repositories")
py_repositories()
The protobuf compiler is written in C++. If you are using C++, please follow the C++ Installation Instructions to install protoc along with the C++ runtime.
For non-C++ users, the simplest way to install the protocol compiler is to download a pre-built binary from our GitHub release page.
In the downloads section of each release, you can find pre-built binaries in zip packages: protoc-$VERSION-$PLATFORM.zip. It contains the protoc binary as well as a set of standard .proto files distributed along with protobuf.
If you are looking for an old version that is not available in the release page, check out the Maven repository.
These pre-built binaries are only provided for released versions. If you want to use the github main version at HEAD, or you need to modify protobuf code, or you are using C++, it's recommended to build your own protoc binary from source.
If you would like to build protoc binary from source, see the C++ Installation Instructions.
Protobuf supports several different programming languages. For each programming language, you can find instructions in the corresponding source directory about how to install protobuf runtime for that specific language:
| Language | Source |
|---|---|
| C++ (include C++ runtime and protoc) | src |
| Java | java |
| Python | python |
| Objective-C | objectivec |
| C# | csharp |
| Ruby | ruby |
| Go | protocolbuffers/protobuf-go |
| PHP | php |
| Dart | dart-lang/protobuf |
| JavaScript | protocolbuffers/protobuf-javascript |
The best way to learn how to use protobuf is to follow the tutorials in our developer guide.
If you want to learn from code examples, take a look at the examples in the examples directory.
The complete documentation is available at the Protocol Buffers doc site.
Read about our version support policy to stay current on support timeframes for the language libraries.
To be alerted to upcoming changes in Protocol Buffers and connect with protobuf developers and users, join the Google Group.