Depend API Implementation

Introduction

The depend API implementation was derived from HarfBuzz's existing closure code paths. The closure infrastructure computes which glyphs are reachable from a starting set by following substitution rules, while the depend API extracts the underlying dependency graph that makes those substitutions possible.

This document describes the depend implementation by contrasting it with the closure paths, focusing on what is different and why.

Closure vs. Depend: Core Difference

Closure answers the question: “Starting from these glyphs, what glyphs can be reached?”

Depend answers the question: “What is the complete dependency graph showing which glyphs depend on which other glyphs?”

Example

Given a ligature rule: f + i → fi

Closure with input {f}: Returns {f, fi} (reachable glyphs)
Depend: Records edges f → fi and i → fi (the dependency graph)

The closure only tells you what's reachable from a specific starting point. The depend graph captures all possible relationships regardless of starting point.

Graph Extraction Strategy

The depend implementation extracts the complete dependency graph by walking the font structure once to record all possible dependency edges.

Active Glyph Filtering

During graph extraction, depend filters by active glyphs using parent_active_glyphs() (same as closure does). This prevents recording spurious edges for glyphs that can never participate in a particular substitution.

Critical difference from closure: Depend does NOT do sequential accumulation. In a contextual rule with multiple lookups, closure accumulates outputs from earlier lookups as inputs to later lookups. Depend records edges from ALL lookups independently to capture the complete graph structure.

Context Requirement Recording

For Context and ChainContext GSUB rules, depend records positional requirements (backtrack/lookahead) in context_set indices. This allows filtering during closure computation - edges are only followed when their positional requirements are satisfied.

Implementation in context_depend_recurse_lookups() (hb-ot-layout-gsubgpos.hh):

Build backtrack and lookahead glyph sets for the rule
Call build_context_set() to store these requirements
Set current_context_set_index so nested edges inherit the context
Recurse into nested lookups
Clear context index when done

Edge Flags for Over-Approximation Detection

Certain contextual edges cannot be precisely modeled by the static dependency graph. The depend API marks these edges with flags so closure implementations can distinguish between “expected” over-approximation (a known limitation) and “unexpected” over-approximation (a bug).

Two flags are defined in hb-depend-data.hh:

HB_DEPEND_EDGE_FLAG_FROM_CONTEXT_POSITION (0x01): Set when an edge comes from a multi-position contextual rule (inputCount > 1). Depend extraction records edges based on what glyphs COULD be at each position per the static input coverage/class. But during closure computation, lookups within the rule are applied sequentially: lookups at earlier positions may transform later positions, and multiple lookups at the same position may interact (one produces a glyph, another immediately consumes it as an “intermediate”). So depend may record edges from glyphs that wouldn't persist at that position.
HB_DEPEND_EDGE_FLAG_FROM_NESTED_CONTEXT (0x02): Set when a lookup is called from within another contextual lookup. The outer context's requirements are not preserved, so the edge may over-approximate.

These flags are set in context_depend_recurse_lookups():

if (inputCount > 1) {
  c->depend_data->current_edge_flags |= HB_DEPEND_EDGE_FLAG_FROM_CONTEXT_POSITION;
}
if (nested_contextual) {
  c->depend_data->current_edge_flags |= HB_DEPEND_EDGE_FLAG_FROM_NESTED_CONTEXT;
}

Closure implementations should track whether any flagged edges contributed to the result. If so, any extra glyphs compared to subset's closure are “expected” over-approximation.

Graph Completeness

The depend graph captures all edges that exist in the font structure. Recent testing with proper ligature_set, context_set, and feature filtering has not generated over-approximation cases when computing closures.

Using the Graph for Closure Computation

To compute closures using the extracted graph, see depend-for-closure.md which documents how to implement closure by iteratively following edges in the dependency graph with appropriate filtering.

Data Structures

Closure Data Structures

Closure modifies sets in place:

hb_set_t *glyphs;  // Modified during traversal
hb_map_t done_lookups_glyph_count;
hb_hashmap_t<unsigned, hb::unique_ptr<hb_set_t>> done_lookups_glyph_set;

Depend Data Structures

Depend builds a persistent graph with deduplication:

struct hb_depend_data_t {
  hb_vector_t<hb_glyph_depend_record_t> glyph_dependencies;  // Per-glyph edge lists
  hb_map_t<uint64_t, hb_vector_t<uint32_t>> edge_hashes;    // Deduplication
  hb_vector_t<hb_set_t> sets;                                // Ligature and context sets
  hb_map_t lookup_features[256];                             // Feature tracking
  hb_codepoint_t current_context_set_index;                  // Current context during extraction
  uint8_t current_edge_flags;                                // Flags for edges being extracted
};

Key features:

Deduplication: Hash-based duplicate detection prevents storing the same edge multiple times
Ligature sets: Tracks complete set of component glyphs needed for each ligature
Context sets: Records backtrack/lookahead requirements for contextual rules
Feature tracking: Records which features activate which lookups
Edge flags: Marks edges that may cause expected over-approximation (FROM_CONTEXT_POSITION, FROM_NESTED_CONTEXT)
Persistent storage: Graph survives beyond initial extraction

Graph Lifecycle and Memory Management

The depend graph is immutable after construction:

Construction: hb_depend_from_face_or_fail() walks all relevant OpenType tables once to build the complete graph
Temporary structures freed: After graph extraction, temporary structures used only during construction are freed:
- edge_hashes - deduplication hash table
- unicodes - Unicode codepoint set
- nominal_glyphs - codepoint-to-glyph mapping
- lookup_features - lookup-to-feature mapping
Persistent structures retained:
- glyph_dependencies - the core dependency graph
- sets - ligature sets for querying

This reduces memory footprint while keeping the graph queryable for the lifetime of the hb_depend_t object.

Dependencies are stored per-glyph and indexed sequentially starting from 0. Use hb_depend_get_glyph_entry() with incrementing index values to iterate through all dependencies for a given glyph.

Context Filtering and Recording

Closure Behavior

Closure filters glyphs by context during execution:

// In ContextSubst Format 1
const auto &input_coverage = this+coverageZ[0];
+ hb_zip (ruleSet, input_coverage)
| hb_filter (glyphs, hb_second)  // Filter by active glyphs
| hb_map (hb_first)
| hb_apply ([&] (const RuleSet &_) { _.closure (c); })
;

Depend Behavior

Depend filters glyphs AND records positional requirements:

// In ContextSubst Format 1 depend method
+ hb_zip (ruleSet, input_coverage)
| hb_filter (c->parent_active_glyphs(), hb_second)
| hb_apply ([&] (const hb_pair_t<...> &_) {
    const RuleSet& rs = this+_.first;
    hb_codepoint_t value = _.second;
    rs.depend (c, value);  // Pass context value
  })
;

The value parameter identifies which glyph from the coverage was matched, allowing the depend code to filter nested lookups appropriately.

Additionally, for Context and ChainContext rules, depend builds and records context_set:

// Build context set for this rule (in context_depend_recurse_lookups)
hb_codepoint_t context_set_idx = c->depend_data->build_context_set(
    &lookahead_sets, nullptr);  // or both backtrack and lookahead
c->depend_data->current_context_set_index = context_set_idx;

// Recurse into lookups - all edges inherit this context_set
context_depend_recurse_lookups(...);

// Clear context after processing
c->depend_data->current_context_set_index = HB_CODEPOINT_INVALID;

This records which backtrack and lookahead glyphs must be present for each edge, enabling accurate closure computation that matches subset behavior in recent testing.

Feature Tagging

Depend tracks which OpenType features are associated with each dependency edge. This allows subsetting tools to determine which glyphs are needed for specific feature sets.

Implementation

During GSUB initialization in hb_depend_t::get_gsub_dependencies():

// Map features to lookups
for (unsigned i = 0; i < feature_count; i++) {
  hb_tag_t ft = gsub.get_feature_tag(i);
  const Feature* feature_ptr = &(fv.get_feature(i));

  for (auto lookup_index : lookup_indexes) {
    data.lookup_features[lookup_index].add(ft);
  }
}

When recording dependencies:

// Look up which features activate this lookup
hb_tag_t layout_tag = HB_TAG_NONE;
if (!data.lookup_features[lookup_index].is_empty()) {
  layout_tag = *data.lookup_features[lookup_index].iter();
}

// Record edge with feature tag
data.add_depend_layout(gid, HB_OT_TAG_GSUB, layout_tag, dependent, ...);

This is stored in the layout_tag field of each dependency record and can be queried via hb_depend_get_glyph_entry().

Ligature Sets

For ligature substitutions, depend tracks the complete set of component glyphs needed to form each ligature. This is stored as a set of glyphs indexed by a ligature_set ID.

Why This Matters

Consider a ligature substitution rule:

"f" + "f" → "ff_ligature"

The ligature output depends on BOTH component glyphs. During closure computation, the ligature should only be added when ALL components are present in the closure.

The ligature_set field identifies which components must be present together. All edges for the same ligature (from each component to the ligature output) share the same ligature_set index, allowing closure computation to check if all components are available before adding the ligature output.

This enables correct ligature handling during closure computation.

COLR Cycles and Self-References

Cycle Handling

The OpenType specification requires COLR paint graphs to be directed acyclic graphs (DAGs), but invalid or malicious fonts may contain cycles. The depend API faithfully reports the graph structure as it exists in the font, including any cycles.

Cycles can occur through COLRv1 PaintColrGlyph (Format 11) operations that reference base glyphs recursively, creating paths like A→B, B→C, C→A. The depend implementation does not perform cycle detection during graph extraction - it records all edges as they appear in the font.

PaintGlyph Self-Reference Filtering

COLRv1 PaintGlyph (Format 10) self-references are specifically filtered out and NOT reported as dependencies. This is because Format 10 references glyph outlines (from glyf/CFF tables) as geometry, not paint graphs. A base glyph referencing its own outline to fill is valid per spec.

Implementation in colrv1-depend.hh:56:

// PaintGlyph (Format 10) - filter self-references
if (gid != c->source_gid)
  c->depend_data->add_depend(c->source_gid, HB_OT_TAG_COLR, gid);

Contrast with PaintColrGlyph (Format 11) which does NOT filter self-references, as these represent actual paint graph dependencies that may form cycles in invalid fonts.

Important: Implementations traversing the depend graph should implement cycle detection to protect against invalid fonts.

Method Naming Convention

The depend methods follow the closure naming pattern with depend instead of closure:

Closure Method	Depend Method	Purpose
`closure(hb_closure_context_t*)`	`depend(hb_depend_context_t*)`	Process this subtable
`closure_lookups(...)`	N/A	Depend doesn't need lookup closure
`intersects(...)`	`intersects(...)`	Shared - check if subtable is relevant

All depend methods are guarded by #ifdef HB_DEPEND_API to allow conditional compilation.

Performance Characteristics

Closure

First query: O(font_size) - must walk entire font
Subsequent queries: O(font_size) - must walk entire font again
Space: O(1) - no persistent storage beyond input sets

Depend

Graph extraction: O(font_size) - one-time cost
Closure queries: O(edges_in_closure) - much faster, uses pre-built graph
Space: O(edges) - stores complete dependency graph

The depend API trades space for time - it pays the cost of extracting the full graph once, then subsequent closure queries are much faster because they just follow pre-computed edges.

Code Structure

Key Files

src/hb-depend.h - Public C API
src/hb-depend.hh - Internal C++ API
src/hb-depend.cc - Main implementation (table dispatching)
src/hb-depend-data.hh - Data structures and edge storage
src/hb-ot-layout-gsubgpos.hh - Contextual lookup handling (shared between closure and depend)
src/OT/Layout/GSUB/*.hh - GSUB format-specific depend methods
src/OT/Color/COLR/colrv1-depend.hh - COLRv1 depend implementation

Note: The public API includes hb_depend_print() for debugging, which prints the full graph to stdout. For programmatic access, use hb_depend_get_glyph_entry() instead.

Parallel Structure

Each closure method has a corresponding depend method in the same file:

// In SingleSubstFormat1.hh

void closure (hb_closure_context_t *c) const {
  // Closure implementation
}

#ifdef HB_DEPEND_API
bool depend (hb_depend_context_t *c) const {
  // Depend implementation - similar structure, different output
}
#endif

This keeps related code together and makes it easier to maintain consistency.

Summary of Key Differences

Aspect	Closure	Depend
Goal	Compute reachable glyphs from a starting set	Extract complete dependency graph
Output	Modified glyph set	Persistent graph with edges
Repeated queries	Must re-walk font each time	O(1) edge lookups in graph
Active glyph filtering	Filters by active glyphs during traversal	Filters by parent_active_glyphs() during extraction
Sequential accumulation	Accumulates outputs for next lookup	Does NOT accumulate (records all edges)
Context filtering	Filters during execution	Records requirements in context_set
Edge flags	N/A	Marks edges for expected over-approximation
Feature tagging	Not tracked	Tracks which features activate edges
Ligature sets	Not tracked	Tracks complete component sets
Deduplication	Not needed	Hash-based to prevent duplicate edges
Memory management	Minimal - no persistent graph	Temporary structures freed after construction
Space complexity	O(1)	O(edges)
Time complexity	O(font_size) per query	O(font_size) once, then O(edges) per query

Future Considerations

The depend API currently focuses on GSUB, glyf, cmap, COLR, and MATH tables. Future extensions could include:

GPOS dependencies: Some GPOS rules reference glyphs (e.g., mark-to-base positioning)
MORX dependencies: Apple Advanced Typography morph tables
Optimization: The graph could be compressed or indexed differently for even faster queries
Incremental updates: Support for modifying the graph as the font changes

The foundation is designed to be extensible - adding support for new tables follows the same pattern established by existing implementations.