blob: fd52579f13a23914d99e5d29023cc5fbc754c19e [file] [log] [blame] [view] [edit]
# Introduction
**⚠️ WARNING: This API is highly experimental and subject to change. It is not
enabled by default and should not be used in production without understanding
the stability implications. The API may change significantly in future releases.**
The dependency API provides access to a glyph dependency graph that represents
how glyphs in an OpenType font reference or produce other glyphs. The graph
maps input glyphs to output glyphs through various OpenType mechanisms:
- **Character mapping (cmap)**: Maps Unicode codepoints to glyphs, including
Unicode Variation Sequences (UVS)
- **Glyph substitution (GSUB)**: Tracks which glyphs can be substituted for
other glyphs through OpenType Layout features
- **Composite glyphs (glyf, CFF)**: Records component glyphs used in composite
glyph construction (TrueType glyf composites and CFF1 SEAC)
- **Color layers (COLR)**: Identifies glyphs used as color layers
- **Math variants (MATH)**: Tracks mathematical variant glyphs
The dependency graph enables finding the transitive closure of all glyphs
reachable from a given input set, which is useful for font subsetting,
analyzing glyph coverage, and optimizing font delivery.
# Building with the Depend API
The dependency API is optional and controlled by the `depend_api` build option.
**This is a highly experimental feature and disabled by default.**
## Enabling the depend_api option
```bash
meson setup build -Ddepend_api=true
meson compile -C build
```
In code, check for availability:
```c
#ifdef HB_DEPEND_API
// Use dependency API
#endif
```
## Discovering available options
To see all available HarfBuzz build options, including `depend_api`:
```bash
# After setting up a build directory
meson configure build
# Or to see just the option values
meson introspect build --buildoptions | jq '.[] | select(.section == "user")'
```
You can also view all available options in the source file `meson_options.txt`.
**Important:** This API is experimental and may change without notice. Do not
use in production applications without being prepared for breaking changes in
future HarfBuzz releases.
# Usage
## Basic Example
```c
#include <hb.h>
hb_blob_t *blob = hb_blob_create_from_file("font.ttf");
hb_face_t *face = hb_face_create(blob, 0);
// Extract dependency graph
hb_depend_t *depend = hb_depend_from_face_or_fail(face);
if (!depend) {
// Handle error
return;
}
// Query dependencies for a specific glyph
hb_codepoint_t gid = 42; // Glyph to query
hb_codepoint_t index = 0;
hb_tag_t table_tag;
hb_codepoint_t dependent;
hb_tag_t layout_tag;
hb_codepoint_t ligature_set;
hb_codepoint_t context_set;
printf("Dependencies for glyph %u:\n", gid);
while (hb_depend_get_glyph_entry(depend, gid, index++,
&table_tag, &dependent,
&layout_tag, &ligature_set, &context_set, NULL)) {
printf(" -> glyph %u via %c%c%c%c\n",
dependent, HB_UNTAG(table_tag));
}
// Clean up
hb_depend_destroy(depend);
hb_face_destroy(face);
hb_blob_destroy(blob);
```
## Iterating Dependencies
To programmatically access dependency information for a specific glyph:
```c
hb_codepoint_t gid = 42; // Glyph ID to query
hb_codepoint_t index = 0; // Dependency entry index
hb_tag_t table_tag;
hb_codepoint_t dependent;
hb_tag_t layout_tag;
hb_codepoint_t ligature_set;
hb_codepoint_t context_set;
// Iterate through all dependencies for this glyph
while (hb_depend_get_glyph_entry(depend, gid, index,
&table_tag, &dependent,
&layout_tag, &ligature_set, &context_set, NULL)) {
// Process dependency information
if (table_tag == HB_OT_TAG_GSUB) {
// GSUB dependency: layout_tag contains feature tag
printf("GID %u -> %u via GSUB feature '%c%c%c%c'\n",
gid, dependent, HB_UNTAG(layout_tag));
} else if (table_tag == HB_TAG('c','m','a','p')) {
// cmap dependency: layout_tag contains UVS codepoint if applicable
printf("GID %u -> %u via cmap\n", gid, dependent);
} else {
// Other dependencies (glyf, COLR, MATH)
printf("GID %u -> %u via %c%c%c%c\n",
gid, dependent, HB_UNTAG(table_tag));
}
index++;
}
```
## Finding Reachable Glyphs
To compute all glyphs reachable from a starting set (transitive closure):
```c
void find_reachable_glyphs(hb_depend_t *depend, hb_set_t *reachable)
{
hb_set_t *to_process = hb_set_create();
hb_set_union(to_process, reachable);
// Process glyphs until none remain
while (!hb_set_is_empty(to_process)) {
hb_codepoint_t gid = hb_set_get_min(to_process);
hb_set_del(to_process, gid);
hb_codepoint_t index = 0;
hb_tag_t table_tag;
hb_codepoint_t dependent;
hb_tag_t layout_tag;
hb_codepoint_t ligature_set;
hb_codepoint_t context_set;
while (hb_depend_get_glyph_entry(depend, gid, index++,
&table_tag, &dependent,
&layout_tag, &ligature_set, &context_set, NULL)) {
if (!hb_set_has(reachable, dependent)) {
hb_set_add(reachable, dependent);
hb_set_add(to_process, dependent);
}
}
}
hb_set_destroy(to_process);
}
// Usage:
hb_set_t *reachable = hb_set_create();
hb_set_add(reachable, 100);
hb_set_add(reachable, 101);
find_reachable_glyphs(depend, reachable);
printf("Found %u reachable glyphs\n", hb_set_get_population(reachable));
hb_set_destroy(reachable);
```
**Note:** This simple algorithm doesn't handle ligature dependencies or context sets
correctly. Ligatures should only be added when all component glyphs in their ligature
set are present, and contextual dependencies should only be followed when their positional
requirements are satisfied. For a production-quality implementation that correctly handles
ligatures, context filtering, and feature filtering to compute closures similar to
HarfBuzz's subset API, see `compute_depend_closure()` in `test/fuzzing/hb-depend-closure-parity.cc`
and `docs/depend-for-closure.md`.
## Working with Ligature Sets
When a dependency entry belongs to a ligature set, you can find all other members
of that set using `hb_depend_get_set_from_index()`:
```c
// After finding ligature_set != HB_CODEPOINT_INVALID from hb_depend_get_glyph_entry()...
// Get all glyphs in this ligature set
hb_set_t *ligature_glyphs = hb_set_create();
if (hb_depend_get_set_from_index(depend, ligature_set, ligature_glyphs)) {
hb_codepoint_t lig_gid = HB_SET_VALUE_INVALID;
printf("Members of ligature set %u:\n", ligature_set);
while (hb_set_next(ligature_glyphs, &lig_gid)) {
// Find the dependency entry index for this glyph in this ligature set
hb_codepoint_t lig_index = 0;
hb_tag_t lig_table_tag;
hb_codepoint_t lig_dependent;
hb_tag_t lig_layout_tag;
hb_codepoint_t lig_ligature_set;
hb_codepoint_t lig_context_set;
while (hb_depend_get_glyph_entry(depend, lig_gid, lig_index++,
&lig_table_tag, &lig_dependent,
&lig_layout_tag, &lig_ligature_set, &lig_context_set, NULL)) {
if (lig_ligature_set == ligature_set) {
printf(" gid=%u, index=%u\n", lig_gid, lig_index - 1);
break;
}
}
}
}
hb_set_destroy(ligature_glyphs);
```
## Working with Context Sets
Context and ChainContext GSUB rules apply lookups only when specific backtrack and/or
lookahead glyphs are present. The depend API records these requirements in context_set
indices as optional information that can refine dependency traversal.
**Note:** Context sets are optional refinement information. Uses of the dependency graph
that ignore context_set (treating it as always satisfied) work fine - they produce a
conservative over-approximation. Context sets allow more precise closure computation when
desired.
### Understanding Context Sets
When `context_set != HB_CODEPOINT_INVALID`, the edge includes positional requirements
from a Context or ChainContext rule. Context set elements use two encodings:
1. **Direct GID reference** (value < 0x80000000): A specific glyph ID
2. **Indirect set reference** (value >= 0x80000000): An index into the sets array
(mask off high bit with `value & 0x7FFFFFFF` to get the set index)
The high bit (0x80000000) distinguishes between direct and indirect references. For an
edge to apply, ALL elements in the context set must be satisfied:
- Direct references: that specific glyph must be present in the closure
- Indirect references: at least ONE glyph from the referenced set must be present (disjunction)
### Checking Context Requirements
```c
bool check_context_satisfied(hb_depend_t *depend,
hb_codepoint_t context_set_idx,
hb_set_t *current_closure)
{
if (context_set_idx == HB_CODEPOINT_INVALID)
return true; // No context requirements
hb_set_t *context_elements = hb_set_create();
if (!hb_depend_get_set_from_index(depend, context_set_idx, context_elements))
{
hb_set_destroy(context_elements);
return false; // Error retrieving context
}
bool satisfied = true;
hb_codepoint_t elem = HB_SET_VALUE_INVALID;
while (hb_set_next(context_elements, &elem))
{
if (elem < 0x80000000)
{
// Direct reference: check if specific glyph is in closure
if (!hb_set_has(current_closure, elem))
{
satisfied = false;
break;
}
}
else
{
// Indirect reference: check if ANY glyph from set is in closure
hb_codepoint_t set_idx = elem & 0x7FFFFFFF;
hb_set_t *required_set = hb_set_create();
if (hb_depend_get_set_from_index(depend, set_idx, required_set))
{
// Check if ANY element from required_set is in current_closure
bool any_found = false;
hb_codepoint_t gid = HB_SET_VALUE_INVALID;
while (hb_set_next(required_set, &gid))
{
if (hb_set_has(current_closure, gid))
{
any_found = true;
break;
}
}
if (!any_found)
satisfied = false;
}
else
{
satisfied = false; // Error retrieving set
}
hb_set_destroy(required_set);
if (!satisfied)
break;
}
}
hb_set_destroy(context_elements);
return satisfied;
}
```
For a complete implementation of context-aware closure computation, see the
`compute_depend_closure()` function in `test/fuzzing/hb-depend-closure-parity.cc`.
# API Details
## Dependency Entry Fields
Each dependency entry returned by `hb_depend_get_glyph_entry()` contains:
- **table_tag**: Source table (e.g., `HB_OT_TAG_GSUB`, `HB_TAG('c','m','a','p')`,
`HB_TAG('g','l','y','f')`, `HB_TAG('C','F','F',' ')`, `HB_TAG('C','O','L','R')`,
`HB_TAG('M','A','T','H')`)
- **dependent**: The dependent glyph ID
- **layout_tag**:
- For GSUB: the feature tag (e.g., `HB_TAG('l','i','g','a')`)
- For cmap with UVS: the variation selector codepoint
- Otherwise: `HB_CODEPOINT_INVALID`
- **ligature_set**: For ligatures, identifies which ligature set; otherwise
`HB_CODEPOINT_INVALID`
- **context_set**: Optional refinement information for Context and ChainContext GSUB rules.
When not `HB_CODEPOINT_INVALID`, identifies positional requirements (backtrack/lookahead)
for this dependency. Can be ignored (treated as always satisfied) for conservative
over-approximation, or checked for more precise closure computation. See "Working with
Context Sets" section for details. *(Experimental feature from Phase 3 implementation)*
- **flags** (optional, nullable): Edge metadata flags. Pass NULL if not needed. Currently defined:
- `HB_DEPEND_EDGE_FLAG_FROM_CONTEXT_POSITION` (0x01): Edge created from a multi-position
contextual rule (Context or ChainContext with inputCount > 1). Depend extraction
records edges based on what glyphs COULD be at each position according to the static
input coverage/class. But during closure computation, lookups within the rule are
applied sequentially: lookups at earlier positions may transform glyphs at later
positions, and multiple lookups at the same position may interact (one produces a
glyph, another immediately consumes it as an "intermediate"). So a glyph matching
the coverage might not actually persist at that position when closure traverses
the rule. These edges may cause over-approximation.
- `HB_DEPEND_EDGE_FLAG_FROM_NESTED_CONTEXT` (0x02): Edge created from a lookup that was
called from within another contextual lookup (nested context). The outer context's
requirements are not preserved on this edge, so it may over-approximate.
These flags help distinguish between "true" over-approximation (a bug) and "expected"
over-approximation (a known limitation of the static dependency analysis). Closure
implementations can use these flags to report which type of over-approximation occurred.
# Implementation Notes
**⚠️ Warning: Graph Cycles in Invalid Fonts**
The dependency graph may contain cycles when processing invalid or malicious fonts.
While the OpenType specification requires COLR paint graphs to be directed acyclic
graphs (DAGs), the depend API faithfully reports the graph structure as it exists
in the font, including any cycles that may be present. Implementations traversing
the depend graph should implement cycle detection to protect against invalid fonts.
For details on how cycles can occur and how PaintGlyph self-references are filtered,
see the "COLR Cycles and Self-References" section in `docs/depend-implementation.md`.
For detailed information about the depend API implementation, including memory
management, data structures, and performance characteristics, see
`docs/depend-implementation.md`.
# Use Cases
## Font Subsetting
Determine which glyphs must be retained to properly render a specific set of
characters, accounting for all OpenType substitutions and compositions.
## Coverage Analysis
Analyze which features or scripts require which glyphs, useful for font
optimization and planning.
## Font Segmentation
Partition a font into smaller subsets where each segment contains glyphs reachable
from a specific set of input characters, enabling more efficient font delivery
for web applications.
## Testing
Verify that font modifications haven't inadvertently broken glyph references or
substitution chains.