gendict Man page

GENDICT(1) ICU 55.1 Manual GENDICT(1)

NAME

gendict – Compiles word list into ICU string trie dictionary

SYNOPSIS

gendict [ –uchars | –bytes –transform transform ] [ -h, -?, –help ] [ -V, –version ] [ -c, –copyright ] [ -v, –verbose ] [ -i, –icud‐
atadir directory ] input-file output-file

DESCRIPTION

gendict reads the word list from dictionary-file and creates a string
trie dictionary file. Normally this data file has the .dict extension.

Words begin at the beginning of a line and are terminated by the first
whitespace. Lines that begin with whitespace are ignored.

OPTIONS

-h, -?, –help
Print help about usage and exit.

-V, –version
Print the version of gendict and exit.

-c, –copyright
Embeds the standard ICU copyright into the output-file.

-v, –verbose
Display extra informative messages during execution.

-i, –icudatadir directory
Look for any necessary ICU data files in directory. For exam‐
ple, the file pnames.icu must be located when ICU’s data is not
built as a shared library. The default ICU data directory is
specified by the environment variable ICU_DATA. Most configura‐
tions of ICU do not require this argument.

–uchars
Set the output trie type to UChar. Mutually exclusive with
–bytes.

–bytes
Set the output trie type to Bytes. Mutually exclusive with
–uchars.

–transform
Set the transform type. Should only be specified with –bytes.
Currently supported transforms are: offset-, which
specifies an offset to subtract from all input characters. It
should be noted that the offset transform also maps U+200D to
0xFF and U+200C to 0xFE, in order to offer compatibility to lan‐
guages that require these characters. A transform must be spec‐
ified for a bytes trie, and when applied to the non-value char‐
acters in the input-file must produce output between 0x00 and
0xFF.

input-file
The source file to read.

output-file
The file to write the output dictionary to.

CAVEATS
The input-file is assumed to be encoded in UTF-8. The integers in the
input-file that are used as values must be made up of ASCII digits.
They may be specified either in hex, by using a 0x prefix, or in deci‐
mal. Either –bytes or –uchars must be specified.

ENVIRONMENT
ICU_DATA Specifies the directory containing ICU data. Defaults to
${prefix}/share/icu/55.1/. Some tools in ICU depend on the
presence of the trailing slash. It is thus important to make
sure that it is present if ICU_DATA is set.

AUTHORS
Maxime Serrano

VERSION
1.0

COPRYRIGHT

Copyright (C) 2012 International Business Machines Corporation and oth‐
ers

SEE ALSO

http://www.icu-project.org/userguide/boundaryAnalysis.html

ICU MANPAGE 1 June 2012 GENDICT(1)