Nathan Willis | 9f4b375 | 2018-10-29 17:10:53 -0500 | [diff] [blame] | 1 | <?xml version="1.0"?> |
| 2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN" |
| 3 | "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [ |
| 4 | <!ENTITY % local.common.attrib "xmlns:xi CDATA #FIXED 'http://www.w3.org/2003/XInclude'"> |
| 5 | <!ENTITY version SYSTEM "version.xml"> |
| 6 | ]> |
Simon Cozens | 11a07c4 | 2015-08-31 10:39:10 +0100 | [diff] [blame] | 7 | <chapter id="buffers-language-script-and-direction"> |
Simon Cozens | 5470e74 | 2015-08-29 08:21:18 +0100 | [diff] [blame] | 8 | <title>Buffers, language, script and direction</title> |
| 9 | <para> |
Nathan Willis | 3b301c5 | 2019-04-30 17:21:33 +0100 | [diff] [blame] | 10 | The input to the HarfBuzz shaper is a series of Unicode characters, stored in a |
Simon Cozens | 5470e74 | 2015-08-29 08:21:18 +0100 | [diff] [blame] | 11 | buffer. In this chapter, we'll look at how to set up a buffer with |
Nathan Willis | 3b301c5 | 2019-04-30 17:21:33 +0100 | [diff] [blame] | 12 | the text that we want and how to customize the properties of the |
| 13 | buffer. We'll also look at a piece of lower-level machinery that |
| 14 | you will need to understand before proceeding: the functions that |
| 15 | HarfBuzz uses to retrieve Unicode information. |
| 16 | </para> |
| 17 | <para> |
| 18 | After shaping is complete, HarfBuzz puts its output back |
| 19 | into the buffer. But getting that output requires setting up a |
| 20 | face and a font first, so we will look at that in the next chapter |
| 21 | instead of here. |
Simon Cozens | 5470e74 | 2015-08-29 08:21:18 +0100 | [diff] [blame] | 22 | </para> |
Simon Cozens | 11a07c4 | 2015-08-31 10:39:10 +0100 | [diff] [blame] | 23 | <section id="creating-and-destroying-buffers"> |
Simon Cozens | 5470e74 | 2015-08-29 08:21:18 +0100 | [diff] [blame] | 24 | <title>Creating and destroying buffers</title> |
| 25 | <para> |
Nathan Willis | ed13cad | 2018-11-28 13:48:38 -0600 | [diff] [blame] | 26 | As we saw in our <emphasis>Getting Started</emphasis> example, a |
| 27 | buffer is created and |
Nathan Willis | 3b301c5 | 2019-04-30 17:21:33 +0100 | [diff] [blame] | 28 | initialized with <function>hb_buffer_create()</function>. This |
Simon Cozens | 5470e74 | 2015-08-29 08:21:18 +0100 | [diff] [blame] | 29 | produces a new, empty buffer object, instantiated with some |
| 30 | default values and ready to accept your Unicode strings. |
| 31 | </para> |
| 32 | <para> |
Nathan Willis | ed13cad | 2018-11-28 13:48:38 -0600 | [diff] [blame] | 33 | HarfBuzz manages the memory of objects (such as buffers) that it |
| 34 | creates, so you don't have to. When you have finished working on |
Nathan Willis | 3b301c5 | 2019-04-30 17:21:33 +0100 | [diff] [blame] | 35 | a buffer, you can call <function>hb_buffer_destroy()</function>: |
Simon Cozens | 5470e74 | 2015-08-29 08:21:18 +0100 | [diff] [blame] | 36 | </para> |
| 37 | <programlisting language="C"> |
Nathan Willis | 3b301c5 | 2019-04-30 17:21:33 +0100 | [diff] [blame] | 38 | hb_buffer_t *buf = hb_buffer_create(); |
| 39 | ... |
| 40 | hb_buffer_destroy(buf); |
| 41 | </programlisting> |
Simon Cozens | 5470e74 | 2015-08-29 08:21:18 +0100 | [diff] [blame] | 42 | <para> |
| 43 | This will destroy the object and free its associated memory - |
| 44 | unless some other part of the program holds a reference to this |
Behdad Esfahbod | a0ad0d5 | 2017-11-20 15:07:48 -0800 | [diff] [blame] | 45 | buffer. If you acquire a HarfBuzz buffer from another subsystem |
Simon Cozens | 5470e74 | 2015-08-29 08:21:18 +0100 | [diff] [blame] | 46 | and want to ensure that it is not garbage collected by someone |
| 47 | else destroying it, you should increase its reference count: |
| 48 | </para> |
| 49 | <programlisting language="C"> |
Nathan Willis | 3b301c5 | 2019-04-30 17:21:33 +0100 | [diff] [blame] | 50 | void somefunc(hb_buffer_t *buf) { |
| 51 | buf = hb_buffer_reference(buf); |
| 52 | ... |
| 53 | </programlisting> |
Simon Cozens | 5470e74 | 2015-08-29 08:21:18 +0100 | [diff] [blame] | 54 | <para> |
| 55 | And then decrease it once you're done with it: |
| 56 | </para> |
| 57 | <programlisting language="C"> |
Nathan Willis | 3b301c5 | 2019-04-30 17:21:33 +0100 | [diff] [blame] | 58 | hb_buffer_destroy(buf); |
| 59 | } |
| 60 | </programlisting> |
| 61 | <para> |
| 62 | While we are on the subject of reference-counting buffers, it is |
| 63 | worth noting that an individual buffer can only meaningfully be |
| 64 | used by one thread at a time. |
| 65 | </para> |
Simon Cozens | 5470e74 | 2015-08-29 08:21:18 +0100 | [diff] [blame] | 66 | <para> |
| 67 | To throw away all the data in your buffer and start from scratch, |
Nathan Willis | 3b301c5 | 2019-04-30 17:21:33 +0100 | [diff] [blame] | 68 | call <function>hb_buffer_reset(buf)</function>. If you want to |
Simon Cozens | 5470e74 | 2015-08-29 08:21:18 +0100 | [diff] [blame] | 69 | throw away the string in the buffer but keep the options, you can |
Nathan Willis | 3b301c5 | 2019-04-30 17:21:33 +0100 | [diff] [blame] | 70 | instead call <function>hb_buffer_clear_contents(buf)</function>. |
Simon Cozens | 5470e74 | 2015-08-29 08:21:18 +0100 | [diff] [blame] | 71 | </para> |
Simon Cozens | 11a07c4 | 2015-08-31 10:39:10 +0100 | [diff] [blame] | 72 | </section> |
Nathan Willis | 3b301c5 | 2019-04-30 17:21:33 +0100 | [diff] [blame] | 73 | |
Simon Cozens | 11a07c4 | 2015-08-31 10:39:10 +0100 | [diff] [blame] | 74 | <section id="adding-text-to-the-buffer"> |
Simon Cozens | 5470e74 | 2015-08-29 08:21:18 +0100 | [diff] [blame] | 75 | <title>Adding text to the buffer</title> |
| 76 | <para> |
Behdad Esfahbod | a0ad0d5 | 2017-11-20 15:07:48 -0800 | [diff] [blame] | 77 | Now we have a brand new HarfBuzz buffer. Let's start filling it |
| 78 | with text! From HarfBuzz's perspective, a buffer is just a stream |
Nathan Willis | 3b301c5 | 2019-04-30 17:21:33 +0100 | [diff] [blame] | 79 | of Unicode code points, but your input string is probably in one of |
| 80 | the standard Unicode character encodings (UTF-8, UTF-16, or |
| 81 | UTF-32). HarfBuzz provides convenience functions that accept |
| 82 | each of these encodings: |
| 83 | <function>hb_buffer_add_utf8()</function>, |
| 84 | <function>hb_buffer_add_utf16()</function>, and |
| 85 | <function>hb_buffer_add_utf32()</function>. Other than the |
| 86 | character encoding they accept, they function identically. |
Simon Cozens | 5470e74 | 2015-08-29 08:21:18 +0100 | [diff] [blame] | 87 | </para> |
Nathan Willis | 3b301c5 | 2019-04-30 17:21:33 +0100 | [diff] [blame] | 88 | <para> |
| 89 | You can add UTF-8 text to a buffer by passing in the text array, |
| 90 | the array's length, an offset into the array for the first |
| 91 | character to add, and the length of the segment to add: |
| 92 | </para> |
| 93 | <programlisting language="C"> |
| 94 | hb_buffer_add_utf8 (hb_buffer_t *buf, |
| 95 | const char *text, |
| 96 | int text_length, |
| 97 | unsigned int item_offset, |
| 98 | int item_length) |
| 99 | </programlisting> |
| 100 | <para> |
| 101 | So, in practice, you can say: |
| 102 | </para> |
| 103 | <programlisting language="C"> |
| 104 | hb_buffer_add_utf8(buf, text, strlen(text), 0, strlen(text)); |
| 105 | </programlisting> |
| 106 | <para> |
| 107 | This will append your new characters to |
| 108 | <parameter>buf</parameter>, not replace its existing |
| 109 | contents. Also, note that you can use <literal>-1</literal> in |
| 110 | place of the first instance of <function>strlen(text)</function> |
| 111 | if your text array is NULL-terminated. Similarly, you can also use |
| 112 | <literal>-1</literal> as the final argument want to add its full |
| 113 | contents. |
| 114 | </para> |
| 115 | <para> |
| 116 | Whatever start <parameter>item_offset</parameter> and |
| 117 | <parameter>item_length</parameter> you provide, HarfBuzz will also |
| 118 | attempt to grab the five characters <emphasis>before</emphasis> |
| 119 | the offset point and the five characters |
| 120 | <emphasis>after</emphasis> the designated end. These are the |
| 121 | before and after "context" segments, which are used internally |
| 122 | for HarfBuzz to make shaping decisions. They will not be part of |
| 123 | the final output, but they ensure that HarfBuzz's |
| 124 | script-specific shaping operations are correct. If there are |
| 125 | fewer than five characters available for the before or after |
| 126 | contexts, HarfBuzz will just grab what is there. |
| 127 | </para> |
| 128 | <para> |
| 129 | For longer text runs, such as full paragraphs, it might be |
| 130 | tempting to only add smaller sub-segments to a buffer and |
| 131 | shape them in piecemeal fashion. Generally, this is not a good |
| 132 | idea, however, because a lot of shaping decisions are |
| 133 | dependent on this context information. For example, in Arabic |
| 134 | and other connected scripts, HarfBuzz needs to know the code |
| 135 | points before and after each character in order to correctly |
| 136 | determine which glyph to return. |
| 137 | </para> |
| 138 | <para> |
| 139 | The safest approach is to add all of the text available, then |
| 140 | use <parameter>item_offset</parameter> and |
| 141 | <parameter>item_length</parameter> to indicate which characters you |
| 142 | want shaped, so that HarfBuzz has access to any context. |
| 143 | </para> |
| 144 | <para> |
| 145 | You can also add Unicode code points directly with |
| 146 | <function>hb_buffer_add_codepoints()</function>. The arguments |
| 147 | to this function are the same as those for the UTF |
| 148 | encodings. But it is particularly important to note that |
| 149 | HarfBuzz does not do validity checking on the text that is added |
| 150 | to a buffer. Invalid code points will be replaced, but it is up |
| 151 | to you to do any deep-sanity checking necessary. |
| 152 | </para> |
| 153 | |
Simon Cozens | 11a07c4 | 2015-08-31 10:39:10 +0100 | [diff] [blame] | 154 | </section> |
Nathan Willis | 3b301c5 | 2019-04-30 17:21:33 +0100 | [diff] [blame] | 155 | |
Simon Cozens | 11a07c4 | 2015-08-31 10:39:10 +0100 | [diff] [blame] | 156 | <section id="setting-buffer-properties"> |
Simon Cozens | 5470e74 | 2015-08-29 08:21:18 +0100 | [diff] [blame] | 157 | <title>Setting buffer properties</title> |
| 158 | <para> |
Nathan Willis | 3b301c5 | 2019-04-30 17:21:33 +0100 | [diff] [blame] | 159 | Buffers containing input characters still need several |
| 160 | properties set before HarfBuzz can shape their text correctly. |
Simon Cozens | 5470e74 | 2015-08-29 08:21:18 +0100 | [diff] [blame] | 161 | </para> |
Simon Cozens | 5470e74 | 2015-08-29 08:21:18 +0100 | [diff] [blame] | 162 | <para> |
Nathan Willis | 3b301c5 | 2019-04-30 17:21:33 +0100 | [diff] [blame] | 163 | Initially, all buffers are set to the |
| 164 | <literal>HB_BUFFER_CONTENT_TYPE_INVALID</literal> content |
| 165 | type. After adding text, the buffer should be set to |
| 166 | <literal>HB_BUFFER_CONTENT_TYPE_UNICODE</literal> instead, which |
| 167 | indicates that it contains un-shaped input |
| 168 | characters. After shaping, the buffer will have the |
| 169 | <literal>HB_BUFFER_CONTENT_TYPE_GLYPHS</literal> content type. |
| 170 | </para> |
| 171 | <para> |
| 172 | <function>hb_buffer_add_utf8()</function> and the |
| 173 | other UTF functions set the content type of their buffer |
| 174 | automatically. But if you are reusing a buffer you may want to |
| 175 | check its state with |
| 176 | <function>hb_buffer_get_content_type(buffer)</function>. If |
| 177 | necessary you can set the content type with |
| 178 | </para> |
| 179 | <programlisting language="C"> |
| 180 | hb_buffer_set_content_type(buf, HB_BUFFER_CONTENT_TYPE_UNICODE); |
| 181 | </programlisting> |
| 182 | <para> |
| 183 | to prepare for shaping. |
| 184 | </para> |
| 185 | <para> |
| 186 | Buffers also need to carry information about the script, |
| 187 | language, and text direction of their contents. You can set |
| 188 | these properties individually: |
| 189 | </para> |
| 190 | <programlisting language="C"> |
| 191 | hb_buffer_set_direction(buf, HB_DIRECTION_LTR); |
| 192 | hb_buffer_set_script(buf, HB_SCRIPT_LATIN); |
| 193 | hb_buffer_set_language(buf, hb_language_from_string("en", -1)); |
| 194 | </programlisting> |
| 195 | <para> |
| 196 | However, since these properties are often the repeated for |
| 197 | multiple text runs, you can also save them in a |
| 198 | <literal>hb_segment_properties_t</literal> for reuse: |
| 199 | </para> |
| 200 | <programlisting language="C"> |
| 201 | hb_segment_properties_t *savedprops; |
| 202 | hb_buffer_get_segment_properties (buf, savedprops); |
| 203 | ... |
| 204 | hb_buffer_set_segment_properties (buf2, savedprops); |
| 205 | </programlisting> |
| 206 | <para> |
| 207 | HarfBuzz also provides getter functions to retrieve a buffer's |
| 208 | direction, script, and language properties individually. |
| 209 | </para> |
| 210 | <para> |
| 211 | HarfBuzz recognizes four text directions in |
| 212 | <type>hb_direction_t</type>: left-to-right |
| 213 | (<literal>HB_DIRECTION_LTR</literal>), right-to-left (<literal>HB_DIRECTION_RTL</literal>), |
| 214 | top-to-bottom (<literal>HB_DIRECTION_TTB</literal>), and |
| 215 | bottom-to-top (<literal>HB_DIRECTION_BTT</literal>). For the |
| 216 | script property, HarfBuzz uses identifiers based on the |
| 217 | <ulink |
Nathan Willis | 97ba206 | 2019-05-25 12:26:50 +0100 | [diff] [blame] | 218 | url="https://unicode.org/iso15924/">ISO 15924 |
Nathan Willis | 3b301c5 | 2019-04-30 17:21:33 +0100 | [diff] [blame] | 219 | standard</ulink>. For languages, HarfBuzz uses tags based on the |
| 220 | <ulink url="https://tools.ietf.org/html/bcp47">IETF BCP 47</ulink> standard. |
| 221 | </para> |
| 222 | <para> |
| 223 | Helper functions are provided to convert character strings into |
| 224 | the necessary script and language tag types. |
| 225 | </para> |
| 226 | <para> |
| 227 | Two additional buffer properties to be aware of are the |
| 228 | "invisible glyph" and the replacement code point. The |
| 229 | replacement code point is inserted into buffer output in place of |
| 230 | any invalid code points encountered in the input. By default, it |
| 231 | is the Unicode <literal>REPLACEMENT CHARACTER</literal> code |
| 232 | point, <literal>U+FFFD</literal> "�". You can change this with |
| 233 | </para> |
| 234 | <programlisting language="C"> |
| 235 | hb_buffer_set_replacement_codepoint(buf, replacement); |
| 236 | </programlisting> |
| 237 | <para> |
Nathan Willis | 78fcb14 | 2019-05-11 20:56:02 +0100 | [diff] [blame] | 238 | passing in the replacement Unicode code point as the |
| 239 | <parameter>replacement</parameter> parameter. |
| 240 | </para> |
| 241 | <para> |
Nathan Willis | 3b301c5 | 2019-04-30 17:21:33 +0100 | [diff] [blame] | 242 | The invisible glyph is used to replace all output glyphs that |
| 243 | are invisible. By default, the standard space character |
| 244 | <literal>U+0020</literal> is used; you can replace this (for |
| 245 | example, when using a font that provides script-specific |
| 246 | spaces) with |
| 247 | </para> |
| 248 | <programlisting language="C"> |
Nathan Willis | 78fcb14 | 2019-05-11 20:56:02 +0100 | [diff] [blame] | 249 | hb_buffer_set_invisible_glyph(buf, replacement_glyph); |
Nathan Willis | 3b301c5 | 2019-04-30 17:21:33 +0100 | [diff] [blame] | 250 | </programlisting> |
| 251 | <para> |
Nathan Willis | 78fcb14 | 2019-05-11 20:56:02 +0100 | [diff] [blame] | 252 | Do note that in the <parameter>replacement_glyph</parameter> |
| 253 | parameter, you must provide the glyph ID of the replacement you |
| 254 | wish to use, not the Unicode code point. |
| 255 | </para> |
| 256 | <para> |
Nathan Willis | 3b301c5 | 2019-04-30 17:21:33 +0100 | [diff] [blame] | 257 | HarfBuzz supports a few additional flags you might want to set |
| 258 | on your buffer under certain circumstances. The |
| 259 | <literal>HB_BUFFER_FLAG_BOT</literal> and |
| 260 | <literal>HB_BUFFER_FLAG_EOT</literal> flags tell HarfBuzz |
| 261 | that the buffer represents the beginning or end (respectively) |
| 262 | of a text element (such as a paragraph or other block). Knowing |
| 263 | this allows HarfBuzz to apply certain contextual font features |
| 264 | when shaping, such as initial or final variants in connected |
| 265 | scripts. |
| 266 | </para> |
| 267 | <para> |
| 268 | <literal>HB_BUFFER_FLAG_PRESERVE_DEFAULT_IGNORABLES</literal> |
| 269 | tells HarfBuzz not to hide glyphs with the |
| 270 | <literal>Default_Ignorable</literal> property in Unicode. This |
| 271 | property designates control characters and other non-printing |
| 272 | code points, such as joiners and variation selectors. Normally |
| 273 | HarfBuzz replaces them in the output buffer with zero-width |
Nathan Willis | 78fcb14 | 2019-05-11 20:56:02 +0100 | [diff] [blame] | 274 | space glyphs (using the "invisible glyph" property discussed |
| 275 | above); setting this flag causes them to be printed, which can |
| 276 | be helpful for troubleshooting. |
Nathan Willis | 3b301c5 | 2019-04-30 17:21:33 +0100 | [diff] [blame] | 277 | </para> |
| 278 | <para> |
| 279 | Conversely, setting the |
| 280 | <literal>HB_BUFFER_FLAG_REMOVE_DEFAULT_IGNORABLES</literal> flag |
| 281 | tells HarfBuzz to remove <literal>Default_Ignorable</literal> |
| 282 | glyphs from the output buffer entirely. Finally, setting the |
| 283 | <literal>HB_BUFFER_FLAG_DO_NOT_INSERT_DOTTED_CIRCLE</literal> |
| 284 | flag tells HarfBuzz not to insert the dotted-circle glyph |
| 285 | (<literal>U+25CC</literal>, "◌"), which is normally |
| 286 | inserted into buffer output when broken character sequences are |
| 287 | encountered (such as combining marks that are not attached to a |
| 288 | base character). |
Simon Cozens | 5470e74 | 2015-08-29 08:21:18 +0100 | [diff] [blame] | 289 | </para> |
Simon Cozens | 11a07c4 | 2015-08-31 10:39:10 +0100 | [diff] [blame] | 290 | </section> |
Nathan Willis | 3b301c5 | 2019-04-30 17:21:33 +0100 | [diff] [blame] | 291 | |
Simon Cozens | 11a07c4 | 2015-08-31 10:39:10 +0100 | [diff] [blame] | 292 | <section id="customizing-unicode-functions"> |
Simon Cozens | 5470e74 | 2015-08-29 08:21:18 +0100 | [diff] [blame] | 293 | <title>Customizing Unicode functions</title> |
| 294 | <para> |
Nathan Willis | 3b301c5 | 2019-04-30 17:21:33 +0100 | [diff] [blame] | 295 | HarfBuzz requires some simple functions for accessing |
| 296 | information from the Unicode Character Database (such as the |
| 297 | <literal>General_Category</literal> (gc) and |
| 298 | <literal>Script</literal> (sc) properties) that is useful |
| 299 | for shaping, as well as some useful operations like composing and |
| 300 | decomposing code points. |
| 301 | </para> |
| 302 | <para> |
Nathan Willis | dd5ad6b | 2019-05-24 20:30:22 +0100 | [diff] [blame] | 303 | HarfBuzz includes its own internal, lightweight set of Unicode |
| 304 | functions. At build time, it is also possible to compile support |
| 305 | for some other options, such as the Unicode functions provided |
| 306 | by GLib or the International Components for Unicode (ICU) |
| 307 | library. Generally, this option is only of interest for client |
| 308 | programs that have specific integration requirements or that do |
| 309 | a significant amount of customization. |
Nathan Willis | 3b301c5 | 2019-04-30 17:21:33 +0100 | [diff] [blame] | 310 | </para> |
| 311 | <para> |
| 312 | If your program has access to other Unicode functions, however, |
| 313 | such as through a system library or application framework, you |
| 314 | might prefer to use those instead of the built-in |
| 315 | options. HarfBuzz supports this by implementing its Unicode |
| 316 | functions as a set of virtual methods that you can replace — |
| 317 | without otherwise affecting HarfBuzz's functionality. |
| 318 | </para> |
| 319 | <para> |
| 320 | The Unicode functions are specified in a structure called |
| 321 | <literal>unicode_funcs</literal> which is attached to each |
| 322 | buffer. But even though <literal>unicode_funcs</literal> is |
| 323 | associated with a <type>hb_buffer_t</type>, the functions |
| 324 | themselves are called by other HarfBuzz APIs that access |
| 325 | buffers, so it would be unwise for you to hook different |
| 326 | functions into different buffers. |
| 327 | </para> |
| 328 | <para> |
| 329 | In addition, you can mark your <literal>unicode_funcs</literal> |
| 330 | as immutable by calling |
Nathan Willis | 78fcb14 | 2019-05-11 20:56:02 +0100 | [diff] [blame] | 331 | <function>hb_unicode_funcs_make_immutable (ufuncs)</function>. |
| 332 | This is especially useful if your code is a |
Nathan Willis | 3b301c5 | 2019-04-30 17:21:33 +0100 | [diff] [blame] | 333 | library or framework that will have its own client programs. By |
| 334 | marking your Unicode function choices as immutable, you prevent |
| 335 | your own client programs from changing the |
| 336 | <literal>unicode_funcs</literal> configuration and introducing |
| 337 | inconsistencies and errors downstream. |
| 338 | </para> |
| 339 | <para> |
| 340 | You can retrieve the Unicode-functions configuration for |
| 341 | your buffer by calling <function>hb_buffer_get_unicode_funcs()</function>: |
| 342 | </para> |
| 343 | <programlisting language="C"> |
| 344 | hb_unicode_funcs_t *ufunctions; |
| 345 | ufunctions = hb_buffer_get_unicode_funcs(buf); |
| 346 | </programlisting> |
| 347 | <para> |
| 348 | The current version of <literal>unicode_funcs</literal> uses six functions: |
| 349 | </para> |
| 350 | <itemizedlist> |
| 351 | <listitem> |
| 352 | <para> |
| 353 | <function>hb_unicode_combining_class_func_t</function>: |
| 354 | returns the Canonical Combining Class of a code point. |
| 355 | </para> |
| 356 | </listitem> |
| 357 | <listitem> |
| 358 | <para> |
| 359 | <function>hb_unicode_general_category_func_t</function>: |
| 360 | returns the General Category (gc) of a code point. |
| 361 | </para> |
| 362 | </listitem> |
| 363 | <listitem> |
| 364 | <para> |
| 365 | <function>hb_unicode_mirroring_func_t</function>: returns |
| 366 | the Mirroring Glyph code point (for bi-directional |
| 367 | replacement) of a code point. |
| 368 | </para> |
| 369 | </listitem> |
| 370 | <listitem> |
| 371 | <para> |
| 372 | <function>hb_unicode_script_func_t</function>: returns the |
| 373 | Script (sc) property of a code point. |
| 374 | </para> |
| 375 | </listitem> |
| 376 | <listitem> |
| 377 | <para> |
| 378 | <function>hb_unicode_compose_func_t</function>: returns the |
| 379 | canonical composition of a sequence of two code points. |
| 380 | </para> |
| 381 | </listitem> |
| 382 | <listitem> |
| 383 | <para> |
| 384 | <function>hb_unicode_decompose_func_t</function>: returns |
| 385 | the canonical decomposition of a code point. |
| 386 | </para> |
| 387 | </listitem> |
| 388 | </itemizedlist> |
| 389 | <para> |
| 390 | Note, however, that future HarfBuzz releases may alter this set. |
| 391 | </para> |
| 392 | <para> |
| 393 | Each Unicode function has a corresponding setter, with which you |
| 394 | can assign a callback to your replacement function. For example, |
| 395 | to replace |
| 396 | <function>hb_unicode_general_category_func_t</function>, you can call |
| 397 | </para> |
| 398 | <programlisting language="C"> |
| 399 | hb_unicode_funcs_set_general_category_func (*ufuncs, func, *user_data, destroy) |
| 400 | </programlisting> |
| 401 | <para> |
| 402 | Virtualizing this set of Unicode functions is primarily intended |
| 403 | to improve portability. There is no need for every client |
| 404 | program to make the effort to replace the default options, so if |
| 405 | you are unsure, do not feel any pressure to customize |
| 406 | <literal>unicode_funcs</literal>. |
Simon Cozens | 5470e74 | 2015-08-29 08:21:18 +0100 | [diff] [blame] | 407 | </para> |
Simon Cozens | 11a07c4 | 2015-08-31 10:39:10 +0100 | [diff] [blame] | 408 | </section> |
Nathan Willis | 3b301c5 | 2019-04-30 17:21:33 +0100 | [diff] [blame] | 409 | |
Nathan Willis | 9f4b375 | 2018-10-29 17:10:53 -0500 | [diff] [blame] | 410 | </chapter> |