Aard Dictionary

«  Aard Tools   ::   Docs

Aard Format

Container Format

Aard Dictionary container format is a binary file format that combines dictionary metadata, lookup index and compressed article data.

Aard files have the following layout: header, metadata, index 1, index 2, articles


Metadata is a dictionary stored as a JSON-encoded string. Aard Dictionary viewer uses the following metadata keys:


number of unique articles in the volume

Changed in version 0.9.0: In previous versions this property meant total number of articles in the whole dictionary. Readers should examine article_count_is_volume_total metadata property to determine whether this value refers to number of articles in volume or in dictionary.


New in version 0.9.0.

true if article_count property means total number of articles in the volume. Since aardtools 0.9.0 compiler always sets this property to true.

dictionary’s “from” language (two or thre letter ISO code)
dictionary’s “from” language (two or thre letter ISO code)
dictionary title
dictionary version
dictionary desription
copyright notice
full license text
description of the source from which dicionary data originated

Index 1

Index 1 is a sequence of fixed-size items containing two values: pointer to Index 2 item and pointer to article item.

Index 2

Index 2 is a sequence of variable-length items containing two values: length of dictionary key text and key text itself.


Articles is a sequence of variable length items containing two values: length of article text and article text itself.

See also

Module struct
Documentation for the struct module

Article Format

From container format perspective article is just a string that is stored either as is or compressed with gzip or bz2 whichever takes less space. Thus articles in Aard files may be in any format that can be represented as string, for example plain text or HTML. Current Aard Dictionary implementation expects HTML 4 or XHTML 1.0 formatted text without enclosing html and body tags.

«  Aard Tools   ::   Docs