Aard Format

Container Format

Aard Dictionary container format is a binary file format that combines dictionary metadata, lookup index and compressed article data.

Aard files have the following layout: header, metadata, index 1, index 2, articles

Metadata

Metadata is a dictionary stored as a JSON-encoded string. Aard Dictionary viewer uses the following metadata keys:

article_count
number of unique articles in the dictionary (in all volumes combined)
index_language
dictionary’s “from” language (two or thre letter ISO code)
article_language
dictionary’s “from” language (two or thre letter ISO code)
title
dictionary title
version
dictionary version
description
dictionary desription
copyright
copyright notice
license
full license text
source
description of the source from which dicionary data originated

Index 1

Index 1 is a sequence of fixed-size items containing two values: pointer to Index 2 item and pointer to article item.

Index 2

Index 2 is a sequence of variable-length items containing two values: length of dictionary key text and key text itself.

Articles

Articles is a sequence of variable length items containing two values: length of article text and article text itself.

See also

Module struct
Documentation for the struct module

Article Formats

From container format perspective article is just a string that is stored either as is or compressed with gzip or bz2 whichever takes less space. Thus articles in Aard files may be in any format that can be represented as string, for example plain text or HTML.

Aard Dictionary 0.7.x can only display JSON-encoded articles (aar-JSON). Aard Dictionary 0.8.0 supports both aar-JSON and aar-HTML. Aard Dictionary 0.9.0 and Aard Dictionary for Android support only aar-HTML.

aar-JSON
Article is represented by a tuple consisting of article text, tags and optional attributes dictionary (so article tuple length may be either 2 or 3). text is UTF8 encoded string. tags is a sequence of tag tuples. Each tag tuple is a tuple of values for tag name, start position, end position and optional attribute dictionary (so each tag tuple has length of either 3 or 4). This data structure is converted to string via JSON serialization.
aar-HTML
Article is represented as HTML 4 or XHTML 1.0 formatted text without enclosing html and body tags.

Table Of Contents

Previous topic

Old Screenshots

Next topic

Aard Tools

This Page