| <?xml version="1.0"?> | 
 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN" | 
 |                "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [ | 
 |   <!ENTITY % local.common.attrib "xmlns:xi  CDATA  #FIXED 'http://www.w3.org/2003/XInclude'"> | 
 |   <!ENTITY version SYSTEM "version.xml"> | 
 | ]> | 
 | <chapter id="what-is-harfbuzz"> | 
 |   <title>What is HarfBuzz?</title> | 
 |   <para> | 
 |     HarfBuzz is a <emphasis>text-shaping engine</emphasis>. If you | 
 |     give HarfBuzz a font and a string containing a sequence of Unicode | 
 |     codepoints, HarfBuzz selects and positions the corresponding | 
 |     glyphs from the font, applying all of the necessary layout rules | 
 |     and font features. HarfBuzz then returns the string to you in the | 
 |     form that is correctly arranged for the language and writing | 
 |     system.  | 
 |   </para> | 
 |   <para> | 
 |     HarfBuzz can properly shape all of the world's major writing | 
 |     systems. It runs on all major operating systems and software | 
 |     platforms and it supports the major font formats in use | 
 |     today. | 
 |   </para> | 
 |   <section id="what-is-text-shaping"> | 
 |     <title>What is text shaping?</title> | 
 |     <para> | 
 |       Text shaping is the process of translating a string of character | 
 |       codes (such as Unicode codepoints) into a properly arranged | 
 |       sequence of glyphs that can be rendered onto a screen or into | 
 |       final output form for inclusion in a document. | 
 |     </para> | 
 |     <para> | 
 |       The shaping process is dependent on the input string, the active | 
 |       font, the script (or writing system) that the string is in, and | 
 |       the language that the string is in. | 
 |     </para> | 
 |     <para> | 
 |       Modern software systems generally only deal with strings in the | 
 |       Unicode encoding scheme (although legacy systems and documents may | 
 |       involve other encodings). | 
 |     </para> | 
 |     <para> | 
 |       There are several font formats that a program might | 
 |       encounter, each of which has a set of standard text-shaping | 
 |       rules. | 
 |     </para> | 
 |     <para>The dominant format is <ulink | 
 |       url="http://www.microsoft.com/typography/otspec/">OpenType</ulink>. The | 
 |     OpenType specification defines a series of <ulink url="https://github.com/n8willis/opentype-shaping-documents">shaping models</ulink> for | 
 |     various scripts from around the world. These shaping models depend on | 
 |     the font incorporating certain features as | 
 |     <emphasis>lookups</emphasis> in its <literal>GSUB</literal>  | 
 |     and <literal>GPOS</literal> tables. | 
 |     </para> | 
 |     <para> | 
 |       Alternatively, OpenType fonts can include shaping features for | 
 |       the <ulink url="https://graphite.sil.org/">Graphite</ulink> shaping model. | 
 |     </para> | 
 |     <para> | 
 |       TrueType fonts can also include OpenType shaping | 
 |       features. Alternatively, TrueType fonts can also include <ulink url="https://developer.apple.com/fonts/TrueType-Reference-Manual/RM09/AppendixF.html">Apple | 
 |       Advanced Typography</ulink> (AAT) tables to implement shaping | 
 |       support. AAT fonts are generally only found on macOS and iOS systems. | 
 |     </para> | 
 |     <para> | 
 |       Text strings will usually be tagged with a script and language | 
 |       tag that provide the context needed to perform text shaping | 
 |       correctly.  The necessary <ulink | 
 |       url="https://docs.microsoft.com/en-us/typography/opentype/spec/scripttags">script</ulink>  | 
 |       and <ulink | 
 |       url="https://docs.microsoft.com/en-us/typography/opentype/spec/languagetags">language</ulink> | 
 |       tags are defined by OpenType. | 
 |     </para> | 
 |   </section> | 
 |    | 
 |   <section id="why-do-i-need-a-shaping-engine"> | 
 |     <title>Why do I need a shaping engine?</title> | 
 |     <para> | 
 |       Text shaping is an integral part of preparing text for | 
 |       display. Before a Unicode sequence can be rendered, the | 
 |       codepoints in the sequence must be mapped to the corresponding | 
 |       glyphs provided in the font, and those glyphs must be positioned | 
 |       correctly relative to each other. For many of the scripts | 
 |       supported in Unicode, these steps involve script-specific layout | 
 |       rules, including complex joining, reordering, and positioning | 
 |       behavior. Implementing these rules is the job of the shaping engine. | 
 |     </para> | 
 |     <para> | 
 |       Text shaping is a fairly low-level operation. HarfBuzz is | 
 |       used directly by text-handling libraries like <ulink | 
 |       url="https://www.pango.org/">Pango</ulink>, as well as by the layout | 
 |       engines in Firefox, LibreOffice, and Chromium. Unless you are | 
 |       <emphasis>writing</emphasis> one of these layout engines | 
 |       yourself, you will probably not need to use HarfBuzz: normally, | 
 |       a layout engine, toolkit, or other library will turn text into | 
 |       glyphs for you. | 
 |     </para> | 
 |     <para> | 
 |       However, if you <emphasis>are</emphasis> writing a layout engine | 
 |       or graphics library yourself, then you will need to perform text | 
 |       shaping, and this is where HarfBuzz can help you. | 
 |     </para> | 
 |     <para> | 
 |       Here are some specific scenarios where a text-shaping engine | 
 |       like HarfBuzz helps you: | 
 |     </para> | 
 |     <itemizedlist> | 
 |       <listitem> | 
 |         <para> | 
 |           OpenType fonts contain a set of glyphs (that is, shapes | 
 | 	  to represent the letters, numbers, punctuation marks, and | 
 | 	  all other symbols), which are indexed by a <literal>glyph ID</literal>. | 
 | 	</para> | 
 | 	<para> | 
 |           A particular glyph ID within the font does not necessarily | 
 | 	  correlate to a predictable Unicode codepoint. For instance, | 
 | 	  some fonts have the letter "a" as glyph ID 1, but | 
 | 	  many others do not. In order to retrieve the right glyph | 
 | 	  from the font to display "a", you need to consult | 
 | 	  the table inside the font (the <literal>cmap</literal> | 
 | 	  table) that maps Unicode codepoints to glyph IDs. In other | 
 | 	  words, <emphasis>text shaping turns codepoints into glyph | 
 | 	  IDs</emphasis>. | 
 |         </para> | 
 |       </listitem> | 
 |       <listitem> | 
 |         <para> | 
 |           Many OpenType fonts contain ligatures: combinations of | 
 |           characters that are rendered as a single unit. For instance, | 
 | 	  it is common for the "f, i" letter | 
 | 	  sequence to appear in print as the single ligature glyph | 
 | 	  "fi". | 
 | 	</para> | 
 | 	<para> | 
 | 	  Whether you should render an "f, i" sequence | 
 | 	  as <literal>fi</literal> or as "fi" does not | 
 |           depend on the input text. Instead, it depends on the whether | 
 | 	  or not the font includes an "fi" glyph and on the | 
 | 	  level of ligature application you wish to perform. The font | 
 | 	  and the amount of ligature application used are under your | 
 | 	  control. In other words, <emphasis>text shaping involves | 
 | 	  querying the font's ligature tables and determining what | 
 | 	  substitutions should be made</emphasis>.  | 
 |         </para> | 
 |       </listitem> | 
 |       <listitem> | 
 |         <para> | 
 |           While ligatures like "fi" are optional typographic | 
 |           refinements, some languages <emphasis>require</emphasis> certain | 
 |           substitutions to be made in order to display text correctly. | 
 |         </para> | 
 | 	<para> | 
 | 	  For example, in Tamil, when the letter "TTA" (ட) | 
 | 	  letter is followed by the vowel sign "U" (ு), the pair | 
 | 	  must be replaced by the single glyph "டு". The | 
 | 	  sequence of Unicode characters "ட,ு" needs to be | 
 | 	  substituted with a single "டு" glyph from the | 
 | 	  font. | 
 | 	</para> | 
 | 	<para> | 
 | 	  But "டு" does not have a Unicode codepoint. To | 
 | 	  find this glyph, you need to consult the table inside  | 
 | 	  the font (the <literal>GSUB</literal> table) that contains | 
 | 	  substitution information. In other words, <emphasis>text shaping  | 
 | 	  chooses the correct glyph for a sequence of characters | 
 | 	  provided</emphasis>. | 
 |         </para> | 
 |       </listitem> | 
 |       <listitem> | 
 |         <para> | 
 |           Similarly, each Arabic character has four different variants | 
 | 	  corresponding to the different positions it might appear in | 
 | 	  within a sequence. Inside a font, there will be separate | 
 | 	  glyphs for the initial, medial, final, and isolated forms of | 
 | 	  each letter, each at a different glyph ID. | 
 | 	</para> | 
 | 	<para> | 
 | 	  Unicode only assigns one codepoint per character, so a | 
 | 	  Unicode string will not tell you which glyph variant to use | 
 | 	  for each character. To decide, you need to analyze the whole | 
 | 	  string and determine the appropriate glyph for each character | 
 | 	  based on its position. In other words, <emphasis>text | 
 | 	  shaping chooses the correct form of the letter by its | 
 | 	  position and returns the correct glyph from the font</emphasis>. | 
 |         </para> | 
 |       </listitem> | 
 |       <listitem> | 
 |         <para> | 
 |           Other languages involve marks and accents that need to be | 
 |           rendered in specific positions relative a base character. For | 
 |           instance, the Moldovan language includes the Cyrillic letter | 
 |           "zhe" (ж) with a breve accent, like so: "ӂ". | 
 | 	</para> | 
 | 	<para> | 
 | 	  Some fonts will provide this character as a single | 
 | 	  zhe-with-breve glyph, but other fonts will not and, instead, | 
 | 	  will expect the rendering engine to form the character by  | 
 |           superimposing the separate "ж" and "˘" | 
 | 	  glyphs. | 
 | 	</para> | 
 | 	<para> | 
 | 	  But exactly where you should draw the breve depends on the | 
 | 	  height and width of the preceding zhe glyph. To find the | 
 | 	  right position, you need to consult the table inside | 
 | 	  the font (the <literal>GPOS</literal> table) that contains | 
 | 	  positioning information. | 
 |           In other words, <emphasis>text shaping tells you whether you | 
 | 	  have a precomposed glyph within your font or if you need to | 
 | 	  compose a glyph yourself out of combining marks—and, | 
 | 	  if so, where to position those marks.</emphasis> | 
 |         </para> | 
 |       </listitem> | 
 |     </itemizedlist> | 
 |     <para> | 
 |       If tasks like these are something that you need to do, then you | 
 |       need a text shaping engine. You could use Uniscribe if you are | 
 |       writing Windows software; you could use CoreText on macOS; or | 
 |       you could use HarfBuzz. | 
 |     </para> | 
 |     <note> | 
 |       <para> | 
 | 	In the rest of this manual, the text will assume that the reader | 
 | 	is that implementor of a text-layout engine. | 
 |       </para> | 
 |     </note> | 
 |   </section> | 
 |    | 
 |  | 
 |   <section> | 
 |     <title>What does HarfBuzz do?</title> | 
 |     <para> | 
 |       HarfBuzz provides text shaping through a cross-platform | 
 |       C API that accepts sequences of Unicode codepoints as input. Currently, | 
 |       the following OpenType shaping models are supported: | 
 |     </para> | 
 |     <itemizedlist> | 
 |       <listitem> | 
 | 	<para> | 
 | 	  Indic (covering Devanagari, Bengali, Gujarati, | 
 | 	  Gurmukhi, Kannada, Malayalam, Oriya, Tamil, Telugu, and | 
 | 	  Sinhala) | 
 | 	</para> | 
 |       </listitem> | 
 |       <listitem> | 
 | 	<para> | 
 | 	  Arabic (covering Arabic, N'Ko, Syriac, and Mongolian) | 
 | 	</para> | 
 |       </listitem> | 
 |       <listitem> | 
 | 	<para> | 
 | 	  Thai and Lao | 
 | 	</para> | 
 |       </listitem> | 
 |       <listitem> | 
 | 	<para> | 
 | 	  Khmer | 
 | 	</para> | 
 |       </listitem> | 
 |       <listitem> | 
 | 	<para> | 
 | 	  Myanmar | 
 | 	</para> | 
 |       </listitem> | 
 |        | 
 |       <listitem> | 
 | 	<para> | 
 | 	  Tibetan | 
 | 	</para> | 
 |       </listitem> | 
 |        | 
 |       <listitem> | 
 | 	<para> | 
 | 	  Hangul | 
 | 	</para> | 
 |       </listitem> | 
 |        | 
 |       <listitem> | 
 | 	<para> | 
 | 	  Hebrew | 
 | 	</para> | 
 |       </listitem>       | 
 |       <listitem> | 
 | 	<para> | 
 | 	  The Universal Shaping Engine or <emphasis>USE</emphasis> | 
 | 	  (covering complex scripts not covered by the above shaping | 
 | 	  models) | 
 | 	</para> | 
 |       </listitem>       | 
 |       <listitem> | 
 | 	<para> | 
 | 	  A default shaping model for non-complex scripts | 
 | 	  (covering Latin, Cyrillic, Greek, Armenian, Georgian, Tifinagh, | 
 | 	  and many others) | 
 | 	</para> | 
 |       </listitem> | 
 |       <listitem> | 
 | 	<para> | 
 | 	  Emoji (including emoji modifier sequences, flag sequences, | 
 | 	  and ZWJ sequences) | 
 | 	</para> | 
 |       </listitem> | 
 |     </itemizedlist> | 
 |  | 
 |     <para> | 
 |       In addition to OpenType shaping, HarfBuzz supports the latest | 
 |       version of Graphite shaping (the "Graphite 2" model) and AAT | 
 |       shaping. | 
 |     </para> | 
 |      | 
 |     <para> | 
 |       HarfBuzz can read and understand TrueType fonts (.ttf), TrueType | 
 |       collections (.ttc), and OpenType fonts (.otf, including those | 
 |       fonts that contain TrueType-style outlines and those that | 
 |       contain PostScript CFF or CFF2 outlines). | 
 |     </para> | 
 |  | 
 |     <para> | 
 |       HarfBuzz is designed and tested to run on top of the FreeType | 
 |       font renderer. It can run on Linux, Android, Windows, macOS, and | 
 |       iOS systems. | 
 |     </para> | 
 |      | 
 |     <para> | 
 |       In addition to its core shaping functionality, HarfBuzz provides | 
 |       functions for accessing other font features, including optional | 
 |       GSUB and GPOS OpenType features, as well as | 
 |       all color-font formats (<literal>CBDT</literal>, | 
 |       <literal>sbix</literal>, <literal>COLR/CPAL</literal>, and | 
 |       <literal>SVG-OT</literal>) and OpenType variable fonts. HarfBuzz | 
 |       also includes a font-subsetting feature. HarfBuzz can perform | 
 |       some low-level math-shaping operations, although it does not | 
 |       currently perform full shaping for mathematical typesetting. | 
 |     </para> | 
 |      | 
 |     <para> | 
 |       A suite of command-line utilities is also provided in the | 
 |       source-code tree, designed to help users test and debug | 
 |       HarfBuzz's features on real-world fonts and input. | 
 |     </para> | 
 |   </section> | 
 |  | 
 |   <section id="what-harfbuzz-doesnt-do"> | 
 |     <title>What HarfBuzz doesn't do</title> | 
 |     <para> | 
 |       HarfBuzz will take a Unicode string, shape it, and give you the | 
 |       information required to lay it out correctly on a single | 
 |       horizontal (or vertical) line using the font provided. That is the | 
 |       extent of HarfBuzz's responsibility. | 
 |     </para> | 
 |     <para> | 
 |       It is important to note that if you are implementing a complete | 
 |       text-layout engine you may have other responsibilities that | 
 |       HarfBuzz will <emphasis>not</emphasis> help you with. For example: | 
 |     </para> | 
 |     <itemizedlist> | 
 |       <listitem> | 
 |         <para> | 
 |           HarfBuzz won't help you with bidirectionality. If you want to | 
 |           lay out text that includes a mix of Hebrew and English, you | 
 | 	  will need to ensure that each buffer provided to HarfBuzz | 
 | 	  has all of its characters in the same order and that the | 
 | 	  directionality of the buffer is set correctly. This may mean | 
 | 	  segmenting the text before it is placed into HarfBuzz buffers. In | 
 |           other words, the user will hit the keys in the following | 
 |           sequence: | 
 |         </para> | 
 |         <programlisting> | 
 | 	  A B C [space] ג ב א [space] D E F | 
 |         </programlisting> | 
 |         <para> | 
 |           but will expect to see in the output: | 
 |         </para> | 
 |         <programlisting> | 
 | 	  ABC אבג DEF | 
 |         </programlisting> | 
 |         <para> | 
 |           This reordering is called <emphasis>bidi processing</emphasis> | 
 |           ("bidi" is short for bidirectional), and there's an | 
 |           algorithm as an annex to the Unicode Standard which tells you how | 
 |           to process a string of mixed directionality. | 
 |           Before sending your string to HarfBuzz, you may need to apply the | 
 |           bidi algorithm to it. Libraries such as <ulink | 
 | 	  url="http://icu-project.org/">ICU</ulink> and <ulink | 
 | 	  url="http://fribidi.org/">fribidi</ulink> can do this for you. | 
 |         </para> | 
 |       </listitem> | 
 |       <listitem> | 
 |         <para> | 
 |           HarfBuzz won't help you with text that contains different font | 
 |           properties. For instance, if you have the string "a | 
 |           <emphasis>huge</emphasis> breakfast", and you expect | 
 |           "huge" to be italic, then you will need to send three | 
 |           strings to HarfBuzz: <literal>a</literal>, in your Roman font; | 
 |           <literal>huge</literal> using your italic font; and | 
 |           <literal>breakfast</literal> using your Roman font again. | 
 | 	</para> | 
 | 	<para> | 
 |           Similarly, if you change the font, font size, script, | 
 | 	  language, or direction within your string, then you will | 
 | 	  need to shape each run independently and output them | 
 | 	  independently. HarfBuzz expects to shape a run of characters | 
 | 	  that all share the same properties. | 
 |         </para> | 
 |       </listitem> | 
 |       <listitem> | 
 |         <para> | 
 |           HarfBuzz won't help you with line breaking, hyphenation, or | 
 |           justification. As mentioned above, HarfBuzz lays out the string | 
 |           along a <emphasis>single line</emphasis> of, notionally, | 
 |           infinite length. If you want to find out where the potential | 
 |           word, sentence and line break points are in your text, you | 
 |           could use the ICU library's break iterator functions. | 
 |         </para> | 
 |         <para> | 
 |           HarfBuzz can tell you how wide a shaped piece of text is, which is | 
 |           useful input to a justification algorithm, but it knows nothing | 
 |           about paragraphs, lines or line lengths. Nor will it adjust the | 
 |           space between words to fit them proportionally into a line. | 
 |         </para> | 
 |       </listitem> | 
 |     </itemizedlist> | 
 |     <para> | 
 |       As a layout-engine implementor, HarfBuzz will help you with the | 
 |       interface between your text and your font, and that's something | 
 |       that you'll need—what you then do with the glyphs that your font | 
 |       returns is up to you.  | 
 |     </para> | 
 |   </section> | 
 |      | 
 |   <section id="why-is-it-called-harfbuzz"> | 
 |     <title>Why is it called HarfBuzz?</title> | 
 |     <para> | 
 |       HarfBuzz began its life as text-shaping code within the FreeType | 
 |       project (and you will see references to the FreeType authors | 
 |       within the source code copyright declarations), but was then | 
 |       extracted out to its own project. This project is maintained by | 
 |       Behdad Esfahbod, who named it HarfBuzz. Originally, it was a | 
 |       shaping engine for OpenType fonts—"HarfBuzz" is | 
 |       the Persian for "open type". | 
 |     </para> | 
 |   </section> | 
 | </chapter> |