docs/usermanual-buffers-language-script-and-direction.xml - third_party/harfbuzz - Git at Google

 <?xml version="1.0"?>
 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
                "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
   <!ENTITY % local.common.attrib "xmlns:xi  CDATA  #FIXED 'http://www.w3.org/2003/XInclude'">
   <!ENTITY version SYSTEM "version.xml">
 ]>
 <chapter id="buffers-language-script-and-direction">
   <title>Buffers, language, script and direction</title>
   <para>
     The input to the HarfBuzz shaper is a series of Unicode characters, stored in a
     buffer. In this chapter, we'll look at how to set up a buffer with
     the text that we want and how to customize the properties of the
     buffer. We'll also look at a piece of lower-level machinery that
     you will need to understand before proceeding: the functions that
     HarfBuzz uses to retrieve Unicode information.
   </para>
   <para>
     After shaping is complete, HarfBuzz puts its output back
     into the buffer. But getting that output requires setting up a
     face and a font first, so we will look at that in the next chapter
     instead of here.
   </para>
   <section id="creating-and-destroying-buffers">
     <title>Creating and destroying buffers</title>
     <para>
       As we saw in our <emphasis>Getting Started</emphasis> example, a
       buffer is created and
       initialized with <function>hb_buffer_create()</function>. This
       produces a new, empty buffer object, instantiated with some
       default values and ready to accept your Unicode strings.
     </para>
     <para>
       HarfBuzz manages the memory of objects (such as buffers) that it
       creates, so you don't have to. When you have finished working on
       a buffer, you can call <function>hb_buffer_destroy()</function>:
     </para>
     <programlisting language="C">
       hb_buffer_t *buf = hb_buffer_create();
       ...
       hb_buffer_destroy(buf);
     </programlisting>
     <para>
       This will destroy the object and free its associated memory -
       unless some other part of the program holds a reference to this
       buffer. If you acquire a HarfBuzz buffer from another subsystem
       and want to ensure that it is not garbage collected by someone
       else destroying it, you should increase its reference count:
     </para>
     <programlisting language="C">
       void somefunc(hb_buffer_t *buf) {
       buf = hb_buffer_reference(buf);
       ...
     </programlisting>
     <para>
       And then decrease it once you're done with it:
     </para>
     <programlisting language="C">
       hb_buffer_destroy(buf);
       }
     </programlisting>
     <para>
       While we are on the subject of reference-counting buffers, it is
       worth noting that an individual buffer can only meaningfully be
       used by one thread at a time.
     </para>
     <para>
       To throw away all the data in your buffer and start from scratch,
       call <function>hb_buffer_reset(buf)</function>. If you want to
       throw away the string in the buffer but keep the options, you can
       instead call <function>hb_buffer_clear_contents(buf)</function>.
     </para>
   </section>

   <section id="adding-text-to-the-buffer">
     <title>Adding text to the buffer</title>
     <para>
       Now we have a brand new HarfBuzz buffer. Let's start filling it
       with text! From HarfBuzz's perspective, a buffer is just a stream
       of Unicode code points, but your input string is probably in one of
       the standard Unicode character encodings (UTF-8, UTF-16, or
       UTF-32). HarfBuzz provides convenience functions that accept
       each of these encodings:
       <function>hb_buffer_add_utf8()</function>,
       <function>hb_buffer_add_utf16()</function>, and
       <function>hb_buffer_add_utf32()</function>. Other than the
       character encoding they accept, they function identically.
     </para>
     <para>
       You can add UTF-8 text to a buffer by passing in the text array,
       the array's length, an offset into the array for the first
       character to add, and the length of the segment to add:
     </para>
     <programlisting language="C">
     hb_buffer_add_utf8 (hb_buffer_t *buf,
                     const char *text,
                     int text_length,
                     unsigned int item_offset,
                     int item_length)
     </programlisting>
     <para>
       So, in practice, you can say:
     </para>
     <programlisting language="C">
       hb_buffer_add_utf8(buf, text, strlen(text), 0, strlen(text));
     </programlisting>
     <para>
       This will append your new characters to
       <parameter>buf</parameter>, not replace its existing
       contents. Also, note that you can use <literal>-1</literal> in
       place of the first instance of <function>strlen(text)</function>
       if your text array is NULL-terminated. Similarly, you can also use
       <literal>-1</literal> as the final argument want to add its full
       contents.
     </para>
     <para>
       Whatever start <parameter>item_offset</parameter> and
       <parameter>item_length</parameter> you provide, HarfBuzz will also
       attempt to grab the five characters <emphasis>before</emphasis>
       the offset point and the five characters
       <emphasis>after</emphasis> the designated end. These are the
       before and after "context" segments, which are used internally
       for HarfBuzz to make shaping decisions. They will not be part of
       the final output, but they ensure that HarfBuzz's
       script-specific shaping operations are correct. If there are
       fewer than five characters available for the before or after
       contexts, HarfBuzz will just grab what is there.
     </para>
     <para>
       For longer text runs, such as full paragraphs, it might be
       tempting to only add smaller sub-segments to a buffer and
       shape them in piecemeal fashion. Generally, this is not a good
       idea, however, because a lot of shaping decisions are
       dependent on this context information. For example, in Arabic
       and other connected scripts, HarfBuzz needs to know the code
       points before and after each character in order to correctly
       determine which glyph to return.
     </para>
     <para>
       The safest approach is to add all of the text available (even
       if your text contains a mix of scripts, directions, languages
       and fonts), then use <parameter>item_offset</parameter> and
       <parameter>item_length</parameter> to indicate which characters you
       want shaped (which must all have the same script, direction,
       language and font), so that HarfBuzz has access to any context.
     </para>
     <para>
       You can also add Unicode code points directly with
       <function>hb_buffer_add_codepoints()</function>. The arguments
       to this function are the same as those for the UTF
       encodings. But it is particularly important to note that
       HarfBuzz does not do validity checking on the text that is added
       to a buffer. Invalid code points will be replaced, but it is up
       to you to do any deep-sanity checking necessary.
     </para>

   </section>

   <section id="setting-buffer-properties">
     <title>Setting buffer properties</title>
     <para>
       Buffers containing input characters still need several
       properties set before HarfBuzz can shape their text correctly.
     </para>
     <para>
       Initially, all buffers are set to the
       <literal>HB_BUFFER_CONTENT_TYPE_INVALID</literal> content
       type. After adding text, the buffer should be set to
       <literal>HB_BUFFER_CONTENT_TYPE_UNICODE</literal> instead, which
       indicates that it contains un-shaped input
       characters. After shaping, the buffer will have the
       <literal>HB_BUFFER_CONTENT_TYPE_GLYPHS</literal> content type.
     </para>
     <para>
       <function>hb_buffer_add_utf8()</function> and the
       other UTF functions set the content type of their buffer
       automatically. But if you are reusing a buffer you may want to
       check its state with
       <function>hb_buffer_get_content_type(buffer)</function>. If
       necessary you can set the content type with
     </para>
     <programlisting language="C">
       hb_buffer_set_content_type(buf, HB_BUFFER_CONTENT_TYPE_UNICODE);
     </programlisting>
     <para>
       to prepare for shaping.
     </para>
     <para>
       Buffers also need to carry information about the script,
       language, and text direction of their contents. You can set
       these properties individually:
     </para>
     <programlisting language="C">
       hb_buffer_set_direction(buf, HB_DIRECTION_LTR);
       hb_buffer_set_script(buf, HB_SCRIPT_LATIN);
       hb_buffer_set_language(buf, hb_language_from_string("en", -1));
     </programlisting>
     <para>
       However, since these properties are often repeated for
       multiple text runs, you can also save them in a
       <literal>hb_segment_properties_t</literal> for reuse:
     </para>
     <programlisting language="C">
       hb_segment_properties_t *savedprops;
       hb_buffer_get_segment_properties (buf, savedprops);
       ...
       hb_buffer_set_segment_properties (buf2, savedprops);
     </programlisting>
     <para>
       HarfBuzz also provides getter functions to retrieve a buffer's
       direction, script, and language properties individually.
     </para>
     <para>
       HarfBuzz recognizes four text directions in
       <type>hb_direction_t</type>: left-to-right
       (<literal>HB_DIRECTION_LTR</literal>), right-to-left (<literal>HB_DIRECTION_RTL</literal>),
       top-to-bottom (<literal>HB_DIRECTION_TTB</literal>), and
       bottom-to-top (<literal>HB_DIRECTION_BTT</literal>).  For the
       script property, HarfBuzz uses identifiers based on the
       <ulink
       url="https://unicode.org/iso15924/">ISO 15924
       standard</ulink>. For languages, HarfBuzz uses tags based on the
       <ulink url="https://tools.ietf.org/html/bcp47">IETF BCP 47</ulink> standard.
     </para>
     <para>
       Helper functions are provided to convert character strings into
       the necessary script and language tag types.
     </para>
     <para>
       Two additional buffer properties to be aware of are the
       "invisible glyph" and the replacement code point. The
       replacement code point is inserted into buffer output in place of
       any invalid code points encountered in the input. By default, it
       is the Unicode <literal>REPLACEMENT CHARACTER</literal> code
       point, <literal>U+FFFD</literal> "&#xFFFD;". You can change this with
     </para>
     <programlisting language="C">
       hb_buffer_set_replacement_codepoint(buf, replacement);
     </programlisting>
     <para>
       passing in the replacement Unicode code point as the
       <parameter>replacement</parameter> parameter.
     </para>
     <para>
       The invisible glyph is used to replace all output glyphs that
       are invisible. By default, the standard space character
       <literal>U+0020</literal> is used; you can replace this (for
       example, when using a font that provides script-specific
       spaces) with
     </para>
     <programlisting language="C">
       hb_buffer_set_invisible_glyph(buf, replacement_glyph);
     </programlisting>
     <para>
       Do note that in the <parameter>replacement_glyph</parameter>
       parameter, you must provide the glyph ID of the replacement you
       wish to use, not the Unicode code point.
     </para>
     <para>
       HarfBuzz supports a few additional flags you might want to set
       on your buffer under certain circumstances. The
       <literal>HB_BUFFER_FLAG_BOT</literal> and
       <literal>HB_BUFFER_FLAG_EOT</literal> flags tell HarfBuzz
       that the buffer represents the beginning or end (respectively)
       of a text element (such as a paragraph or other block). Knowing
       this allows HarfBuzz to apply certain contextual font features
       when shaping, such as initial or final variants in connected
       scripts.
     </para>
     <para>
       <literal>HB_BUFFER_FLAG_PRESERVE_DEFAULT_IGNORABLES</literal>
       tells HarfBuzz not to hide glyphs with the
       <literal>Default_Ignorable</literal> property in Unicode. This
       property designates control characters and other non-printing
       code points, such as joiners and variation selectors. Normally
       HarfBuzz replaces them in the output buffer with zero-width
       space glyphs (using the "invisible glyph" property discussed
       above); setting this flag causes them to be printed, which can
       be helpful for troubleshooting.
     </para>
     <para>
       Conversely, setting the
       <literal>HB_BUFFER_FLAG_REMOVE_DEFAULT_IGNORABLES</literal> flag
       tells HarfBuzz to remove <literal>Default_Ignorable</literal>
       glyphs from the output buffer entirely. Finally, setting the
       <literal>HB_BUFFER_FLAG_DO_NOT_INSERT_DOTTED_CIRCLE</literal>
       flag tells HarfBuzz not to insert the dotted-circle glyph
       (<literal>U+25CC</literal>, "&#x25CC;"), which is normally
       inserted into buffer output when broken character sequences are
       encountered (such as combining marks that are not attached to a
       base character).
     </para>
   </section>

   <section id="customizing-unicode-functions">
     <title>Customizing Unicode functions</title>
     <para>
       HarfBuzz requires some simple functions for accessing
       information from the Unicode Character Database (such as the
       <literal>General_Category</literal> (gc) and
       <literal>Script</literal> (sc) properties) that is useful
       for shaping, as well as some useful operations like composing and
       decomposing code points.
     </para>
     <para>
       HarfBuzz includes its own internal, lightweight set of Unicode
       functions. At build time, it is also possible to compile support
       for some other options, such as the Unicode functions provided
       by GLib or the International Components for Unicode (ICU)
       library. Generally, this option is only of interest for client
       programs that have specific integration requirements or that do
       a significant amount of customization.
     </para>
     <para>
       If your program has access to other Unicode functions, however,
       such as through a system library or application framework, you
       might prefer to use those instead of the built-in
       options. HarfBuzz supports this by implementing its Unicode
       functions as a set of virtual methods that you can replace —
       without otherwise affecting HarfBuzz's functionality.
     </para>
     <para>
       The Unicode functions are specified in a structure called
       <literal>unicode_funcs</literal> which is attached to each
       buffer. But even though <literal>unicode_funcs</literal> is
       associated with a <type>hb_buffer_t</type>, the functions
       themselves are called by other HarfBuzz APIs that access
       buffers, so it would be unwise for you to hook different
       functions into different buffers.
     </para>
     <para>
       In addition, you can mark your <literal>unicode_funcs</literal>
       as immutable by calling
       <function>hb_unicode_funcs_make_immutable (ufuncs)</function>.
       This is especially useful if your code is a
       library or framework that will have its own client programs. By
       marking your Unicode function choices as immutable, you prevent
       your own client programs from changing the
       <literal>unicode_funcs</literal> configuration and introducing
       inconsistencies and errors downstream.
     </para>
     <para>
       You can retrieve the Unicode-functions configuration for
       your buffer by calling <function>hb_buffer_get_unicode_funcs()</function>:
     </para>
     <programlisting language="C">
       hb_unicode_funcs_t *ufunctions;
       ufunctions = hb_buffer_get_unicode_funcs(buf);
     </programlisting>
     <para>
       The current version of <literal>unicode_funcs</literal> uses six functions:
     </para>
     <itemizedlist>
       <listitem>
 	<para>
 	  <function>hb_unicode_combining_class_func_t</function>:
 	  returns the Canonical Combining Class of a code point.
       	</para>
       </listitem>
       <listitem>
 	<para>
 	  <function>hb_unicode_general_category_func_t</function>:
 	  returns the General Category (gc) of a code point.
       	</para>
       </listitem>
       <listitem>
 	<para>
 	  <function>hb_unicode_mirroring_func_t</function>: returns
 	  the Mirroring Glyph code point (for bi-directional
 	  replacement) of a code point.
       	</para>
       </listitem>
       <listitem>
 	<para>
 	  <function>hb_unicode_script_func_t</function>: returns the
 	  Script (sc) property of a code point.
       	</para>
       </listitem>
       <listitem>
 	<para>
 	  <function>hb_unicode_compose_func_t</function>: returns the
 	  canonical composition of a sequence of two code points.
 	</para>
       </listitem>
       <listitem>
 	<para>
 	  <function>hb_unicode_decompose_func_t</function>: returns
 	  the canonical decomposition of a code point.
 	</para>
       </listitem>
     </itemizedlist>
     <para>
       Note, however, that future HarfBuzz releases may alter this set.
     </para>
     <para>
       Each Unicode function has a corresponding setter, with which you
       can assign a callback to your replacement function. For example,
       to replace
       <function>hb_unicode_general_category_func_t</function>, you can call
     </para>
     <programlisting language="C">
       hb_unicode_funcs_set_general_category_func (*ufuncs, func, *user_data, destroy)
     </programlisting>
     <para>
       Virtualizing this set of Unicode functions is primarily intended
       to improve portability. There is no need for every client
       program to make the effort to replace the default options, so if
       you are unsure, do not feel any pressure to customize
       <literal>unicode_funcs</literal>.
     </para>
   </section>

 </chapter>
	<?xml version="1.0"?>
	<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
	"http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
	<!ENTITY % local.common.attrib "xmlns:xi CDATA #FIXED 'http://www.w3.org/2003/XInclude'">
	<!ENTITY version SYSTEM "version.xml">
	]>
	<chapter id="buffers-language-script-and-direction">
	<title>Buffers, language, script and direction</title>
	<para>
	The input to the HarfBuzz shaper is a series of Unicode characters, stored in a
	buffer. In this chapter, we'll look at how to set up a buffer with
	the text that we want and how to customize the properties of the
	buffer. We'll also look at a piece of lower-level machinery that
	you will need to understand before proceeding: the functions that
	HarfBuzz uses to retrieve Unicode information.
	</para>
	<para>
	After shaping is complete, HarfBuzz puts its output back
	into the buffer. But getting that output requires setting up a
	face and a font first, so we will look at that in the next chapter
	instead of here.
	</para>
	<section id="creating-and-destroying-buffers">
	<title>Creating and destroying buffers</title>
	<para>
	As we saw in our <emphasis>Getting Started</emphasis> example, a
	buffer is created and
	initialized with <function>hb_buffer_create()</function>. This
	produces a new, empty buffer object, instantiated with some
	default values and ready to accept your Unicode strings.
	</para>
	<para>
	HarfBuzz manages the memory of objects (such as buffers) that it
	creates, so you don't have to. When you have finished working on
	a buffer, you can call <function>hb_buffer_destroy()</function>:
	</para>
	<programlisting language="C">
	hb_buffer_t *buf = hb_buffer_create();
	...
	hb_buffer_destroy(buf);
	</programlisting>
	<para>
	This will destroy the object and free its associated memory -
	unless some other part of the program holds a reference to this
	buffer. If you acquire a HarfBuzz buffer from another subsystem
	and want to ensure that it is not garbage collected by someone
	else destroying it, you should increase its reference count:
	</para>
	<programlisting language="C">
	void somefunc(hb_buffer_t *buf) {
	buf = hb_buffer_reference(buf);
	...
	</programlisting>
	<para>
	And then decrease it once you're done with it:
	</para>
	<programlisting language="C">
	hb_buffer_destroy(buf);
	}
	</programlisting>
	<para>
	While we are on the subject of reference-counting buffers, it is
	worth noting that an individual buffer can only meaningfully be
	used by one thread at a time.
	</para>
	<para>
	To throw away all the data in your buffer and start from scratch,
	call <function>hb_buffer_reset(buf)</function>. If you want to
	throw away the string in the buffer but keep the options, you can
	instead call <function>hb_buffer_clear_contents(buf)</function>.
	</para>
	</section>

	<section id="adding-text-to-the-buffer">
	<title>Adding text to the buffer</title>
	<para>
	Now we have a brand new HarfBuzz buffer. Let's start filling it
	with text! From HarfBuzz's perspective, a buffer is just a stream
	of Unicode code points, but your input string is probably in one of
	the standard Unicode character encodings (UTF-8, UTF-16, or
	UTF-32). HarfBuzz provides convenience functions that accept
	each of these encodings:
	<function>hb_buffer_add_utf8()</function>,
	<function>hb_buffer_add_utf16()</function>, and
	<function>hb_buffer_add_utf32()</function>. Other than the
	character encoding they accept, they function identically.
	</para>
	<para>
	You can add UTF-8 text to a buffer by passing in the text array,
	the array's length, an offset into the array for the first
	character to add, and the length of the segment to add:
	</para>
	<programlisting language="C">
	hb_buffer_add_utf8 (hb_buffer_t *buf,
	const char *text,
	int text_length,
	unsigned int item_offset,
	int item_length)
	</programlisting>
	<para>
	So, in practice, you can say:
	</para>
	<programlisting language="C">
	hb_buffer_add_utf8(buf, text, strlen(text), 0, strlen(text));
	</programlisting>
	<para>
	This will append your new characters to
	<parameter>buf</parameter>, not replace its existing
	contents. Also, note that you can use <literal>-1</literal> in
	place of the first instance of <function>strlen(text)</function>
	if your text array is NULL-terminated. Similarly, you can also use
	<literal>-1</literal> as the final argument want to add its full
	contents.
	</para>
	<para>
	Whatever start <parameter>item_offset</parameter> and
	<parameter>item_length</parameter> you provide, HarfBuzz will also
	attempt to grab the five characters <emphasis>before</emphasis>
	the offset point and the five characters
	<emphasis>after</emphasis> the designated end. These are the
	before and after "context" segments, which are used internally
	for HarfBuzz to make shaping decisions. They will not be part of
	the final output, but they ensure that HarfBuzz's
	script-specific shaping operations are correct. If there are
	fewer than five characters available for the before or after
	contexts, HarfBuzz will just grab what is there.
	</para>
	<para>
	For longer text runs, such as full paragraphs, it might be
	tempting to only add smaller sub-segments to a buffer and
	shape them in piecemeal fashion. Generally, this is not a good
	idea, however, because a lot of shaping decisions are
	dependent on this context information. For example, in Arabic
	and other connected scripts, HarfBuzz needs to know the code
	points before and after each character in order to correctly
	determine which glyph to return.
	</para>
	<para>
	The safest approach is to add all of the text available (even
	if your text contains a mix of scripts, directions, languages
	and fonts), then use <parameter>item_offset</parameter> and
	<parameter>item_length</parameter> to indicate which characters you
	want shaped (which must all have the same script, direction,
	language and font), so that HarfBuzz has access to any context.
	</para>
	<para>
	You can also add Unicode code points directly with
	<function>hb_buffer_add_codepoints()</function>. The arguments
	to this function are the same as those for the UTF
	encodings. But it is particularly important to note that
	HarfBuzz does not do validity checking on the text that is added
	to a buffer. Invalid code points will be replaced, but it is up
	to you to do any deep-sanity checking necessary.
	</para>

	</section>

	<section id="setting-buffer-properties">
	<title>Setting buffer properties</title>
	<para>
	Buffers containing input characters still need several
	properties set before HarfBuzz can shape their text correctly.
	</para>
	<para>
	Initially, all buffers are set to the
	<literal>HB_BUFFER_CONTENT_TYPE_INVALID</literal> content
	type. After adding text, the buffer should be set to
	<literal>HB_BUFFER_CONTENT_TYPE_UNICODE</literal> instead, which
	indicates that it contains un-shaped input
	characters. After shaping, the buffer will have the
	<literal>HB_BUFFER_CONTENT_TYPE_GLYPHS</literal> content type.
	</para>
	<para>
	<function>hb_buffer_add_utf8()</function> and the
	other UTF functions set the content type of their buffer
	automatically. But if you are reusing a buffer you may want to
	check its state with
	<function>hb_buffer_get_content_type(buffer)</function>. If
	necessary you can set the content type with
	</para>
	<programlisting language="C">
	hb_buffer_set_content_type(buf, HB_BUFFER_CONTENT_TYPE_UNICODE);
	</programlisting>
	<para>
	to prepare for shaping.
	</para>
	<para>
	Buffers also need to carry information about the script,
	language, and text direction of their contents. You can set
	these properties individually:
	</para>
	<programlisting language="C">
	hb_buffer_set_direction(buf, HB_DIRECTION_LTR);
	hb_buffer_set_script(buf, HB_SCRIPT_LATIN);
	hb_buffer_set_language(buf, hb_language_from_string("en", -1));
	</programlisting>
	<para>
	However, since these properties are often repeated for
	multiple text runs, you can also save them in a
	<literal>hb_segment_properties_t</literal> for reuse:
	</para>
	<programlisting language="C">
	hb_segment_properties_t *savedprops;
	hb_buffer_get_segment_properties (buf, savedprops);
	...
	hb_buffer_set_segment_properties (buf2, savedprops);
	</programlisting>
	<para>
	HarfBuzz also provides getter functions to retrieve a buffer's
	direction, script, and language properties individually.
	</para>
	<para>
	HarfBuzz recognizes four text directions in
	<type>hb_direction_t</type>: left-to-right
	(<literal>HB_DIRECTION_LTR</literal>), right-to-left (<literal>HB_DIRECTION_RTL</literal>),
	top-to-bottom (<literal>HB_DIRECTION_TTB</literal>), and
	bottom-to-top (<literal>HB_DIRECTION_BTT</literal>). For the
	script property, HarfBuzz uses identifiers based on the
	<ulink
	url="https://unicode.org/iso15924/">ISO 15924
	standard</ulink>. For languages, HarfBuzz uses tags based on the
	<ulink url="https://tools.ietf.org/html/bcp47">IETF BCP 47</ulink> standard.
	</para>
	<para>
	Helper functions are provided to convert character strings into
	the necessary script and language tag types.
	</para>
	<para>
	Two additional buffer properties to be aware of are the
	"invisible glyph" and the replacement code point. The
	replacement code point is inserted into buffer output in place of
	any invalid code points encountered in the input. By default, it
	is the Unicode <literal>REPLACEMENT CHARACTER</literal> code
	point, <literal>U+FFFD</literal> "�". You can change this with
	</para>
	<programlisting language="C">
	hb_buffer_set_replacement_codepoint(buf, replacement);
	</programlisting>
	<para>
	passing in the replacement Unicode code point as the
	<parameter>replacement</parameter> parameter.
	</para>
	<para>
	The invisible glyph is used to replace all output glyphs that
	are invisible. By default, the standard space character
	<literal>U+0020</literal> is used; you can replace this (for
	example, when using a font that provides script-specific
	spaces) with
	</para>
	<programlisting language="C">
	hb_buffer_set_invisible_glyph(buf, replacement_glyph);
	</programlisting>
	<para>
	Do note that in the <parameter>replacement_glyph</parameter>
	parameter, you must provide the glyph ID of the replacement you
	wish to use, not the Unicode code point.
	</para>
	<para>
	HarfBuzz supports a few additional flags you might want to set
	on your buffer under certain circumstances. The
	<literal>HB_BUFFER_FLAG_BOT</literal> and
	<literal>HB_BUFFER_FLAG_EOT</literal> flags tell HarfBuzz
	that the buffer represents the beginning or end (respectively)
	of a text element (such as a paragraph or other block). Knowing
	this allows HarfBuzz to apply certain contextual font features
	when shaping, such as initial or final variants in connected
	scripts.
	</para>
	<para>
	<literal>HB_BUFFER_FLAG_PRESERVE_DEFAULT_IGNORABLES</literal>
	tells HarfBuzz not to hide glyphs with the
	<literal>Default_Ignorable</literal> property in Unicode. This
	property designates control characters and other non-printing
	code points, such as joiners and variation selectors. Normally
	HarfBuzz replaces them in the output buffer with zero-width
	space glyphs (using the "invisible glyph" property discussed
	above); setting this flag causes them to be printed, which can
	be helpful for troubleshooting.
	</para>
	<para>
	Conversely, setting the
	<literal>HB_BUFFER_FLAG_REMOVE_DEFAULT_IGNORABLES</literal> flag
	tells HarfBuzz to remove <literal>Default_Ignorable</literal>
	glyphs from the output buffer entirely. Finally, setting the
	<literal>HB_BUFFER_FLAG_DO_NOT_INSERT_DOTTED_CIRCLE</literal>
	flag tells HarfBuzz not to insert the dotted-circle glyph
	(<literal>U+25CC</literal>, "◌"), which is normally
	inserted into buffer output when broken character sequences are
	encountered (such as combining marks that are not attached to a
	base character).
	</para>
	</section>

	<section id="customizing-unicode-functions">
	<title>Customizing Unicode functions</title>
	<para>
	HarfBuzz requires some simple functions for accessing
	information from the Unicode Character Database (such as the
	<literal>General_Category</literal> (gc) and
	<literal>Script</literal> (sc) properties) that is useful
	for shaping, as well as some useful operations like composing and
	decomposing code points.
	</para>
	<para>
	HarfBuzz includes its own internal, lightweight set of Unicode
	functions. At build time, it is also possible to compile support
	for some other options, such as the Unicode functions provided
	by GLib or the International Components for Unicode (ICU)
	library. Generally, this option is only of interest for client
	programs that have specific integration requirements or that do
	a significant amount of customization.
	</para>
	<para>
	If your program has access to other Unicode functions, however,
	such as through a system library or application framework, you
	might prefer to use those instead of the built-in
	options. HarfBuzz supports this by implementing its Unicode
	functions as a set of virtual methods that you can replace —
	without otherwise affecting HarfBuzz's functionality.
	</para>
	<para>
	The Unicode functions are specified in a structure called
	<literal>unicode_funcs</literal> which is attached to each
	buffer. But even though <literal>unicode_funcs</literal> is
	associated with a <type>hb_buffer_t</type>, the functions
	themselves are called by other HarfBuzz APIs that access
	buffers, so it would be unwise for you to hook different
	functions into different buffers.
	</para>
	<para>
	In addition, you can mark your <literal>unicode_funcs</literal>
	as immutable by calling
	<function>hb_unicode_funcs_make_immutable (ufuncs)</function>.
	This is especially useful if your code is a
	library or framework that will have its own client programs. By
	marking your Unicode function choices as immutable, you prevent
	your own client programs from changing the
	<literal>unicode_funcs</literal> configuration and introducing
	inconsistencies and errors downstream.
	</para>
	<para>
	You can retrieve the Unicode-functions configuration for
	your buffer by calling <function>hb_buffer_get_unicode_funcs()</function>:
	</para>
	<programlisting language="C">
	hb_unicode_funcs_t *ufunctions;
	ufunctions = hb_buffer_get_unicode_funcs(buf);
	</programlisting>
	<para>
	The current version of <literal>unicode_funcs</literal> uses six functions:
	</para>
	<itemizedlist>
	<listitem>
	<para>
	<function>hb_unicode_combining_class_func_t</function>:
	returns the Canonical Combining Class of a code point.
	</para>
	</listitem>
	<listitem>
	<para>
	<function>hb_unicode_general_category_func_t</function>:
	returns the General Category (gc) of a code point.
	</para>
	</listitem>
	<listitem>
	<para>
	<function>hb_unicode_mirroring_func_t</function>: returns
	the Mirroring Glyph code point (for bi-directional
	replacement) of a code point.
	</para>
	</listitem>
	<listitem>
	<para>
	<function>hb_unicode_script_func_t</function>: returns the
	Script (sc) property of a code point.
	</para>
	</listitem>
	<listitem>
	<para>
	<function>hb_unicode_compose_func_t</function>: returns the
	canonical composition of a sequence of two code points.
	</para>
	</listitem>
	<listitem>
	<para>
	<function>hb_unicode_decompose_func_t</function>: returns
	the canonical decomposition of a code point.
	</para>
	</listitem>
	</itemizedlist>
	<para>
	Note, however, that future HarfBuzz releases may alter this set.
	</para>
	<para>
	Each Unicode function has a corresponding setter, with which you
	can assign a callback to your replacement function. For example,
	to replace
	<function>hb_unicode_general_category_func_t</function>, you can call
	</para>
	<programlisting language="C">
	hb_unicode_funcs_set_general_category_func (ufuncs, func, user_data, destroy)
	</programlisting>
	<para>
	Virtualizing this set of Unicode functions is primarily intended
	to improve portability. There is no need for every client
	program to make the effort to replace the default options, so if
	you are unsure, do not feel any pressure to customize
	<literal>unicode_funcs</literal>.
	</para>
	</section>

	</chapter>