Background and terminology

Fonts in XFree86 : Background and terminology
Previous: Fonts in XFree86
Next: New fonts

1. Background and terminology

1.1. Characters and glyphs

A character is an abstract unit of a writing system. Examples of characters include the Latin capital letter A, the Arabic letter jim, and the dingbat black scissors.

A glyph is a shape that may represent one or many characters when displayed by a window system or printed by a printer.

While glyphs roughly correspond to characters in most cases, this correspondence is not, in general, one to one. For example, a font may have many variant forms of the capital letter A; a single fi ligature may correspond to the letters f and i.

A coded character set is a set of characters together with a mapping from integer codes -- known as codepoints -- to characters. Examples of coded character sets include US-ASCII, ISO 8859-1, KOI8-R, and JIS X 0208(1990).

A coded character set need not use 8-bit integers to index characters. Many early mainframes used 6-bit character sets, while 16-bit (or more) character sets are necessary for ideographic writing systems.

1.2. Font files, fonts, and XLFD

Traditionally, typographers speak about typefaces and founts (we use the traditional British spelling to distinguish founts from digital fonts). A typeface is a particular style or design, such as Times Italic, while a fount is a molten-lead incarnation of a given typeface at a given size.

Digital fonts come in font files. A font file contains all the information necessary for generating glyphs of a given typeface, and applications using font files may access glyph information in arbitrary order.

Digital fonts may consist of bitmap data, in which case they are said to be bitmap fonts. They may also consist of a mathematical description of glyph shapes, in which case they are said to be scalable fonts. Common formats for scalable font files are Type 1 (sometimes incorrectly called ATM fonts or PostScript fonts), Speedo and TrueType.

The glyph data in a digital font needs to be indexed somehow. How this is done depends on the font file format. In the case of Type 1 fonts, glyphs are identified by glyph names. In the case of TrueType fonts, glyphs are indexed by integers corresponding to one of a number of indexing schemes (usually Unicode --- see below).

The X11 system uses the data in font file to generate font instances, which are collections of glyphs at a given size indexed according to a given encoding. X11 font instances are specified using a notation known as the X Logical Font Description (XLFD). An XLFD starts with a dash `-', and consists of fourteen fields separated by dashes, for example

-adobe-courier-medium-r-normal--0-0-0-0-m-0-iso8859-1

Or particular interest are the last two fields `iso8859-1', which specify the font instance's encoding.

1.3. Unicode

Unicode (http://www.unicode.org) is a coded character set with the goal of uniquely identifying all characters for all scripts, current and historical. While Unicode was explicitly not designed as a glyph encoding scheme, it is often possible to use it as such.

Unicode is an open character set, in that codepoint assignments may be added to Unicode at any time (once specified, though, an assignment can never be changed). For this reason, a Unicode font will be sparse, and only define glyphs for a subset of the character registry of Unicode.

The Unicode standard is defined in parallel with ISO 10646. Assignments in the two standards are always equivalent, and this document uses the terms ``Unicode'' and ``ISO 10646'' interchangeably.

When used in X11, Unicode-encoded fonts should have the last two fields of their XLFD set to `iso10646-1'.

Fonts in XFree86 : Background and terminology
Previous: Fonts in XFree86
Next: New fonts