<ETAX sifx="" id="{32b13e9f-fe51-49f4-bbeb-5a14f541cb0b}" />

 

 

 

 

 

 

 

WordCruncher

ETAX Documentation

 

 

10 September 2009

 

 

 

 


Table of Contents

 

Table of Contents. i

Introduction.. 1

ETAX Elements. 1

The Root Element <etax></etax>.. 1

The Book Information Element <bookInfo></bookInfo>.. 1

The Style Include Element <sifx></sifx>.. 1

SIFX Elements. 2

The Document Style Element <DS/>.. 2

The Paragraph Style Element <PS/>.. 3

The Lexicon Element <LEX/>.. 4

The Text Style Element <TS/>.. 4

The Level Type Element <LVL/>.. 5

The Attribute Type Element <ATTR/>.. 6

The Reference Tree Element <TREE/>.. 6

The Tag Type Element <TAG/>.. 6

The Hyperlink Style Element <HLS/>.. 6

The Index Options Element <OPT/>.. 7

The Phrase Group Element <GRP/>.. 7

The Upgrade Elements <UPGRADE><REF/>…<UPGRADE/>.. 7

ETAX Paragraph Elements. 9

The Paragraph Element <p></p>.. 9

The Table Element <ptbl></ptbl>.. 9

The Table Row Element <trow></trow>.. 9

The Table Cell Element <tcell></tcell>.. 9

ETAX Formatting Elements. 11

Paragraph Format Elements. 11

The Text Style Element <T></T>.. 11

Text Style Override Elements. 11

Other Elements. 13

Reference Level Elements <R/><Re/>.. 13

Hyperlinks <H></H>.. 14

Inline Images <I/>.. 16

Ruby Text Elements <rt/><rte/>.. 17

The Hard Characters Element <ch></ch>.. 17

Phrase Group Elements <g/><ge/> <gx/><gxe/>.. 17

The Include Element <include/>.. 17


Introduction

This is the documentation for the WordCruncher ETAX XML application.

ETAX Elements

These are the elements used in the ETAX file.  Elements and attributes in blue are for future consideration.

The Root Element <etax></etax>

This is the root element for an ETAX file.

Attributes

Name

Values

Description

id

GUID

The universal ID for the book.  (Highly recommended).

sifx

SIFX filename

An optional external SIFX file to be used instead of the <sifx> element in the file.

ettx

ETTX filename

An optional external ETTX file to be used instead of the <ettx> element in the file.

emtx

EMTX filename

An optional external EMTX file to be used instead of the <emtx> element in the file.

isbn

ISBN number

The ISBN number for the book.

exp

Expiration date: yyyy-mm-dd

The date that the book should expire.  The book will be usable through the given date.

The contents of the <etax> element consist of an optional, single <sifx> element followed by a list of paragraph elements (any combination of <p> elements or <ptbl> elements) or <include/> elements.  Any text contained in this element is ignored unless otherwise specified by a child element.

The Book Information Element <bookInfo></bookInfo>

This optional element should appear immediately after the <etax> start element and before any other element in the file.  It contains additional information about the book.

Attributes

Name

Values

Description

title

Possible additions being considered.

Use as default when adding book to library.

author

 

 

publisher

 

 

copyright

 

 

printDate

 

 

revision

 

 

Other???

 

 

The Abstract Element <abstract></abstract>

This optional element must be a child of the <bookInfo> element.  The text of this element will be stored in the library when the book is added to the library.  This element has no attributes.

The Style Include Element <sifx></sifx>

This element is optional and must appear immediately after the <bookInfo> element if it exists, or immediately after the <etax> start element if the <bookInfo> element is omitted.  This element is also used as the root element for an external SIFX file.  This element has no attributes and any text contained in this element is ignored.


SIFX Elements

The following are elements that can occur as child elements of the <sifx> element.

The Document Style Element <DS/>

This is the document style.  It must be an empty element.

Attributes

Name

Values

Description

lnWidth

Measurement value:

m:n[,n]

Where m=t|p|i|c

t->twips     p->points

i->inches    c->centimeters

Where n=Real number.

The second optional number is for small format ETBU.

This is the maximum line width of the document.  (Highly recommended).

mrgL

Measurement value

The left document margin.

mrgR

Measurement value

The right document margin.

clrTxt

Color value: (fg; bk)

Color/RGB; Color/RGB

RGB= n,n,n where n=0-255

Color=[black, blue, brown, cyan, dkblue, dkcyan, dkgray, dkgreen, dkmagenta, dkred, green, ltgray, magenta, red, white, yellow]

The default text color.

clrHlink

Color value: (fg; bk)

The default hyperlink text color.

clrRef

Color value: (fg; bk)

The default reference level color.

clrHit

Color value: (fg; bk)

The default hit word color.

clrHilite

Color value: (fg; bk)

The default color for search results words that are not the current hit word.

clrReader

Color value: (fg; bk)

The color of the reader bar.

idxOff

Any combination of the following values:

bold | italic | script | underline | strikeout | revised | overbar | underbar | caps | effect | hidden

This selects styles that are not indexed by default.

dir

ltr | rtl

General layout direction of the document.  (Highly recommended).

dirCit

ltr | rtl

Layout direction of the citation.  (Highly recommended).

tHeight

Measurement value

Default text height.  (Highly recommended).

dict

Lexicon file name.

If this attribute exists, the document is marked as a dictionary.  This is the lexicon file that the dictionary uses.  Cannot be used with searchlex.

lvlDict

Single character.

The level type code to use as the entry level for the dictionary.  Ignored if dict is not output.

concordance

yes | no

Mark the document as a concordance

srchlex

yes | no

If this attribute exists, the document is marked as a search lexicon.  Cannot be used with dict.

lvlSrchlex

Single character.

The level type code to use as the entry level for the search lexicon.  Ignored if searchlex is not output.

lvlSrch

Single character.

This is the level type to be used in “lowest level” searches.

lvlOutput

Single character.

This is the level type to be used in “lowest level” outputs (copy).

rtDisp

top | bottom | off

Where to display the ruby text in relation to the base text.

rtIdx

yes | no

Whether or not to index the ruby text.

rtJust

center | left | right

Justification of the ruby text

rtLex

Lexicon name

Name of the lexicon this ruby text should be included in.  If omitted, the ruby text is included in the same lexicon of the base text.

rtPos

Measurement value

Adjustment of the position of the ruby text.

rtSize

Whole number percentage between 40 and 80.  (Default 60)

Size of the ruby text based on a percentage of the height of the base characters.

rtSt

Text style name   String[63]

 

zoom

Whole number percentage between 50 and 300.  (Default: 100)

This is the general zoom percentage of the document.

pStTagwnd

Paragraph style name.   String[63]

The default paragraph style to be used in the Tag Window.

The Paragraph Style Element <PS/>

This is the paragraph style element.  It must be an empty element.  The first paragraph style listed will be considered the default paragraph style.

Attributes

Name

Values

Description

st

String[63].

The name of the paragraph style.  (Required).

tSt

String[63].

The name of a text style associated with this paragraph style.  If no text style is active, this style will be used.

just

left | right | center | full

Paragraph justification.

dir

ltr | rtl

Layout direction of the paragraph.  (Highly recommended).

spB

Measurement value.

Extra space added before the paragraph.

spA

Measurement value.

Extra space added after the paragraph.

lnHeight

Measurement value.

Fixed height of the line.

lnSp

Real number between 0.25 and 32.0.  (Default: 1.0)

Line spacing multiplier.

wrap

win | line

Line wrapping mode.

indF

Measurement value.

First line indent.

indL

Measurement value.

Left indent.

indR

Measurement value.

Right indent.

clrBk

Color value: (bk)

 

tabs

Multiple semicolon separated tab measurement values:

m:Tln[,n]{; Tln[,n]}*

 

Where m=t|p|i|c

t->twips           p->points

i->inches         c->centimeters

Where T=L|R|C|D       (Optional)

L->left              R->right

C->center          D->decimal

Where l=d|p|u|w|x|y   (Optional)

d->dash                            p->dot

u->underline                     w->dot blank

x->underline dot               y->dash blank

Where [,n]= tab (small format ETBU).

Tab sets for the paragraph.

tabsDef

Single tab measurement value.

Default tab sets for the paragraph.  These sets occur after the last explicitly defined tab set.

bdrL

Left border

m:n[,n];p[,p];l;clr

 

Where m=t|p|i|c

t->twips           p->points

i->inches         c->centimeters

Where n= width

Where [,n]= width (small format ETBU)

Where p= padding

Where l= single | double | dot | dash | wave

Where clr= Border color (fg)

NOTE: All semicolon separated sections are optional, however the semicolons are not.

The border to use for the left of the paragraph.

bdrR

Right border.

The border to use for the right of the paragraph.

bdrT

Top border.

The border to use for the top of the paragraph.

bdrB

Bottom border.

The border to use for the bottom of the paragraph.

The Lexicon Element <LEX/>

This is the lexicon element.  This element is used to define separate categories that words in the document can be stored in.  At least one <LEX> element must be defined.  These are empty elements.

Attributes

Name

Values

Description

st

String[63].

The name of the lexicon.  (Required).

id

Standard language name or standard language abbreviation for the lexicon.

The logical language of all the words in the lexicon.

dec

Single character.

The character that should be used as the decimal separator for the lexicon.  If specified, then grp must also be specified.

grp

Single character

The character that should be used as the numeric group separator for the lexicon.  If specified, then dec must also be specified.

chrIgn

String.

List of ignore characters.  These are characters that will not be included in the text of a word.

chrBrk

String.

List of break characters.  These are characters that will automatically break a word.

chrNobrk

String.

List of no-break characters.  These are characters that will not automatically break a word.

tSt

Text style name.   String[63]

Default style for the lexicon.

wrdbrk

Any combination of the following values:

hidden-nobrk | style-brk | script-nobrk

Word breaking mode for this lexicon.

The Text Style Element <TS/>

This is the text style element.  It must be an empty element.

Attributes

Name

Values

Description

st

String[63].

The name of the text style.  (Required).

tHeight

Measurement value

Height of the font used for displaying text.

tWidth

Measurement value

Width of the font used for displaying text.  (Not recommended).

lexSt

Lexicon name.   String[63]

The name of the lexicon associated with this text style.

fFace

String

Regular font face

fFaceSm

String

Font face used for a small format ETBU.

fFaceAlt

String

Alternate font face to use if fFace does not exist.

fFamily

decorative | default | modern | roman | script | swiss

Font family. (Not recommended).

fPitch

default | fixed | proportional | variable

Font pitch. (Not recommended).

fQuality

antialiased | cleartype | default | draft | non-antialised | proof

Output quality of the font for the text.

tagtype

Tag type code.   Single Character

Name of a tag type style to apply to this text style.

clrTxt

Color value: (fg/bk).

Color of the text.

clrUnderline

Color value: (fg)

Color of the underline

clrStrikeout

Color value: (fg)

Color of the strikeout

clrOverbar

Color value: (fg)

Color of the overbar

clrUnderbar

Color value: (fg)

Color of the underbar

chrProp

Any combination of the following values (space separated).  Some of the values are mutually exclusive:

bold | italic | (superscript | subscript) | hidden | revised | noindex | (underline | dash-underline | dot-underline | double-underline | wave-underline) | ( emboss | engrave | outline | shadow) | (smcaps | allcaps | smcaps-up) | subword | tag | (strikeout | dash-strikeout | dot-strikeout | double-strikeout | wave-strikeout) | (overbar | dash-overbar | dot-overbar | double-overbar | wave-overbar) | (underbar | dash-underbar | dot-underbar | double-underbar | wave-underbar)

Character style of the text.

The Level Type Element <LVL/>

This is the level type element.  It must be an empty element.

Attributes

Name

Values

Description

code

Single character.

The single character code used for the level type.  (Required).

name

String[31]

The name of the level type.  (Required).

plural

String[31]

The plural name of the level type.  (Recommended).

sep

String[31]

String to use in the citation line just before displaying this type of level.

tSt

String[63].

Text style to useRegular font face

tagtype

Tag type code.   Single Character

Name of a tag type style to apply to this level type. The name and abbreviation of every level of this type will be indexed as the given tag type.  If this is omitted, then the tag type defined in tSt will be used.

lexSt

Lexicon name.   String[63]

The name of the lexicon associated with this level type.  Used only if the name will be indexed.  If this is omitted, then the lexicon defined in tSt will be used if it is defined, otherwise the indexer will use the currently defined lexicon.

The Attribute Type Element <ATTR/>

This is the attribute element.  These are used as attributes for reference levels.  It must be an empty element.

Attributes

Name

Values

Description

code

Single character.

The single character code used for the attribute type.  (Required).

name

String[31]

The name of the level type.  (Required).

plural

String[31]

The plural name of the level type.  (Recommended).

tagtype

Tag type code.   Single Character

Name of a tag type style to apply to this attribute type. The name of every attribute of this type will be indexed as the given tag type.

lexSt

Lexicon name.   String[63]

The name of the lexicon associated with this attribute type.  Used only if the name will be indexed.  If this is omitted, then the lexicon defined in the tSt attribute of the Level Type for the given reference level will be used.  If that is not defined, the indexer will use the currently defined lexicon.

The Reference Tree Element <TREE/>

This is the reference tree element.  Up to eight reference hierarchies can be defined in a document.  This element allows the author to define a name for each of these “trees”.  It must be an empty element.

Attributes

Name

Values

Description

idx

Whole number between 1 and 8.

The index of the reference tree.  (Required).

name

String.    String[63]

The name to associate with the reference tree.  (Required).

default

yes | no

Specifies the default reference tree.  Only one tree can be marked as the default tree.  If no trees are marked as the default, the first populated tree is used.

The Tag Type Element <TAG/>

This is the tag type element.  It must be an empty element.  You can specify a maximum of 14 different tag types.

Attributes

Name

Values

Description

code

Single character.

The single character code used for the level type.  (Required).

name

String[31]

The name of the level type.  (Required).

plural

String[31]

The plural name of the level type.  (Recommended).

expand

yes | no

If yes, showing this tag type will automatically cause the expansion of any generic tag sections.

The Hyperlink Style Element <HLS/>

This is the hyperlink style element.  These define general styles used for hyperlinks.  It must be an empty element.

Attributes

Name

Values

Description

st

String[63].

The name of the text style.  (Required).

type

icon | phrase

The hyperlink type

tHeight

Measurement value

Height of the font used for displaying text.

tWidth

Measurement value

Width of the font used for displaying text.  (Not recommended).

fFace

String

Regular font face

fFaceSm

String

Font face used for a small format ETBU.

fFacePrint

String

Font face used for printing.

fFamily

decorative | default | modern | roman | script | swiss

Font family. (Not recommended).

fPitch

default | fixed | proportional | variable

Font pitch. (Not recommended).

fQuality

antialiased | cleartype | default | draft | non-antialised | proof

Output quality of the font for the text.

clrTxt

Color value: (fg/bk).

Color of the text.

clrUnderline

Color value: (fg)

Color of the underline

clrStrikeout

Color value: (fg)

Color of the strikeout

clrOverbar

Color value: (fg)

Color of the overbar

clrUnderbar

Color value: (fg)

Color of the underbar

chrProp

Any combination of the following values (space separated).  Some of the values are mutually exclusive:

bold | italic | (superscript | subscript) | hidden | revised | noindex | (underline | dash-underline | dot-underline | double-underline | wave-underline) | ( emboss | engrave | outline | shadow) | (smcaps | allcaps | smcaps-up) | subword | tag | (strikeout | dash-strikeout | dot-strikeout | double-strikeout | wave-strikeout) | (overbar | dash-overbar | dot-overbar | double-overbar | wave-overbar) | (underbar | dash- underbar | dot-underbar | double- underbar | wave- underbar)

Character style of the text.

em

yes | no

Forces emphasis of the target

lib

yes | no

Forces the execution of the hyperlink to look in the Library first for matching a matching target.

The Index Options Element <OPT/>

This is the index options element.  It must be an empty element.

Attributes

Name

Values

Description

wrdbrk

Any combination of the following values:

hidden-nobrk | style-brk | script-nobrk

Word breaking options.  This is the default for the whole document and can be overridden by the same attribute in the <LEX> element.

Stopwrds

stop | go

If a stopword file exists, this tells whether the words are stopwords or gowords.

Comp

Any combination of the following values:

off | on | text | index

Compression options.

The Phrase Group Element <GRP/>

This is the phrase group element.  This defines global properties for any phrase group that is used in the document.  It must be an empty element.

Attributes

Name

Values

Description

idx

Integer between 1 and 32000.

The index of the phrase group.  (Required).

lexSt

Lexicon name.   String[63]

The lexicon name to use for the phrase group.  (Required).

 

The Upgrade Elements <UPGRADE><REF/>…<UPGRADE/>

These elements provide information needed to translate citations from a previous version of the file to the current version.  For instance, if in a previous ETB version (ver. 5) a particular citation was: “/Introduction” and this was changed to “/Title/Introduction” in the new version (ver. 7), this element will provide enough information to make this translation.  This information is used primarily during the upgrade of note files that were attached to previous versions of the document in order to translate old citations to the new ones so that the new position of the notes can be located.  It is NOT used in the indexing of the document.

Each <UPGRADE> element includes one or more empty <REF/> elements.  Each <REF/> element defines one citation translation.  Multiple (up to 4) <UPGRADE> elements can be included, one for each previous version.  However, at this point there is only one previous version that has been released.

Attributes <UPGRADE><UPGRADE/>

Name

Values

Description

ver

5

The previous file format version.  At this time, only version “5” is allowed.  (Required).

file

File title.   String[63]

This is the previous file title (i.e. the File name without the extension. (Required).

codepage

Code page integer identifier.

  1250 – Central European (Windows)

  1251 – Cyrillic (Windows)

  1252 – Western European (Windows)

  1253 – Greek (Windows)

  1254 – Turkish (Windows)

  1255 – Hebrew (Windows)

  1256 – Arabic (Windows)

  1257 – Baltic (Windows)

  Others…

This is the codepage that will be used to translate the old citation to Unicode. (Required).

Attributes <REF/>

Name

Values

Description

old

Citation (without offset)

This is the citation in the previous version that is in need of translation. It can be a partial citation. (Required).

new

Citation (without offset)

This is the new citation.  When a citation is in need of translation, if an old citation is found, then it will be replaced with the new citation.  A citation is considered a match if the old citation completely matches the citation being translated (up to the number of levels defined).  If the citation in question has levels beyond the match, they are concatenated onto the end of the translated citation.  All entries will be checked and the longest match will be used for translation. (Required).

children

yes | no

If yes, then the citation will only be translated if the citation in question has child levels beyond the match.

codepage

Code page integer identifier.

  1250 – Central European (Windows)

  1251 – Cyrillic (Windows)

  1252 – Western European (Windows)

  1253 – Greek (Windows)

  1254 – Turkish (Windows)

  1255 – Hebrew (Windows)

  1256 – Arabic (Windows)

  1257 – Baltic (Windows)

  Others…

This is the codepage that will be used to translate the old citation to Unicode. This will override the codepage specified in the <UPGRADE> element.


ETAX Paragraph Elements

The body of an ETAX document consists of a list of paragraph elements.  There are currently two types of paragraphs: normal and table.

The Paragraph Element <p></p>

This is the normal paragraph element.  This paragraph element can have any attributes that a <PS/> element can have except the name attribute.  These attributes become overrides to the currently active paragraph style.  The <p> element can also include the following attributes:

Attributes

Name

Values

Description

st

String[63]

The name of the paragraph style to use.  If omitted the default paragraph style is assumed (i.e. the first paragraph style listed in the <sifx> element).

{See <PS> element in the <sifx>}

 

These will override any settings in the currently active paragraph style.

Any text contained within this element is included in the text of the document.

The Table Element <ptbl></ptbl>

This is the table paragraph element.  It is considered an alternate type of paragraph and marks the start of a table.  The <ptbl> element can have any of the attributes that the <p> element can have plus the following:

Attributes

Name

Values

Description

col

List of measurement values separated by semicolons.  Each measurement value can be substituted with either an asterisk (*) or a percentage to represent an automatically calculated column width or a width based on a percentage of the line width.

The width of each column in the table.

colMin

A measurement value.

The minimum width of any column in the table.

tblType

flat | 3d

Visual style of the table

valign

top | center | bottom

Vertical alignment of the text in each cell

hpad

Measurement value

Horizontal internal cell padding.

vpad

Measurement value

Vertical internal cell padding

spc

Measurement value

Spacing in-between adjacent cells

bdr

Measurement value

Border width

inbdr

Measurement value

Width of inside borders

clrBdr

Color value: (fg)

Color of borders

The <ptbl> element contains a list of <trow> elements.  Any text contained within a <ptbl> element is ignored unless it is within a child <p> element.

The Table Row Element <trow></trow>

This is the table row element.  It defines each row of a table.  This element contains a list of <tcell> elements.  Any text contained within a <trow> element is ignored unless it is within a child <p> element. This element can have any of the same attributes that a <tcell> element can have, except for spanCol and spanRow.  These attributes will apply to each table cell on the row unless specifically overridden by the <tcell> element.

The Table Cell Element <tcell></tcell>

This is the table cell element.  It defines each cell contained in a table row.  This element contains a list of paragraph elements (either <p> or <ptbl> elements).  Any text contained within a <tcell> element is ignored unless it is within a child <p> element.  This element can contain the following attributes:

Attributes

Name

Values

Description

valign

top | center | bottom

Vertical alignment of the text in each cell

hpad

Measurement value

Horizontal internal cell padding.

vpad

Measurement value

Vertical internal cell padding

spanCol

Whole number between 1 and 16

Number of columns for this cell to span.  This number cannot exceed the number of columns left in the row.  If a cell spans multiple columns, the spanned cells are NOT emitted.

spanRow

Whole number between 1 and 16

Number of rows for this cell to span.  If a cell spans multiple rows, the spanned cells on the following rows are NOT emitted.

clrBk

Color value: (bk)

Background color of the cell.  This color overrides the background color of a table.

bdrL

Left border (See <PS/>).

The border to use for the left of the cell.

bdrR

Right border.

The border to use for the right of the cell.

bdrT

Top border.

The border to use for the top of the cell.

bdrB

Bottom border

The border to use for the bottom of the cell.

 


ETAX Formatting Elements

This section describes the elements that can be used inside a paragraph (<p>) definition.

Paragraph Format Elements

Inside a paragraph definition (<p>) all printable characters will be included in the text of a paragraph.  Any character with a Unicode value less than a SPACE (i.e. tabs, line feeds, carriage returns, etc.) will be ignored.  This allows for some formatting of the paragraph text in the XML document.  Several empty elements are used instead of these characters:

 

Element

Description

<tab/>

Inserts a literal tab.

<br/>

Inserts a hard return.  This does NOT break the logical paragraph.

<w/>

Inserts a hard word break.

<sp/>

Inserts a hard non-breaking space.

<zs/>

Inserts a zero width space.

<l/>

Inserts a left indent.  This is essentially a tab that also sets the left indent property for the rest of the paragraph

<r/>

Inserts a right indent.  The same as the left indent, except extends from the right side of the paragraph.

<d/>

Inserts a double indent.  This is equivalent to inserting both a left and a right indent simultaneously.

<lm/>

Inserts a Unicode LTR (Left To Right) mark.

<lo/>

Inserts a Unicode LTR override mark.

<le/>

Inserts a Unicode LTR embedding mark.

<rm/>

Inserts a Unicode RTL (Right To Left) mark .

<ro/>

Inserts a Unicode RTL override mark.

<re/>

Inserts a Unicode RTL embedding mark.

<pdf/>

Inserts a Unicode PDF (Pop Directional Format) mark.  This is used to terminate any of the above directional override or embedding modes (<lo/><le/><ro/><re/>).

The Text Style Element <T></T>

This element (<T>) is used set the current text style.  This is NOT an empty element.  The text style will remain in effect for all text that is contained within the element and will override any text style specified in the paragraph element or the currently active paragraph style.  If no text style element is output, or if text occurs outside this element, the text style specified in the paragraph or paragraph style will be used.

Attributes

Name

Values

Description

st

String[63]

The name of a <TS> record in the sifx.  (Required).

Text Style Override Elements

Every attribute of a text style can be overridden individually by several Text Style Override Elements.  These elements are NOT empty elements.  Their attributes remain in effect for all text that is contained within the element.  These elements are listed below:

 

Element

Description

<b>

Turns bolding on/off.

<i>

Turns italics on/off

<s>

Turns superscript or subscript on/off

<u>

Turns underline on/off.

<o>

Turns strikeout on/off.

<ob>

Turns overbar on/off.

<ub>

Turns underbar on/off.

<c>

Turns all caps or small caps on/off.

<e>

Turns on a special effect like embossing, engraving, outline, or shadow.

<x>

Turns indexing on/off.

<rev>

Turns on/off revised text.

<h>

Turns on/off hidden text.

<f>

Changes the font.

<lex>

Changes the currently active lexicon.

<sz>

Changes the size of the text.

<tt>

Changes the current tag type.

<t>

Turns the general tagtype flag on/off.

<sw>

Turns the subword flag on/off.  All text in this element will be indexed as a subword.

<cf>

Changes the foreground color.

<cb>

Changes the background color.

<cu>

Changes the underline color.

<co>

Changes the strikeout color.

<cob>

Changes the overbar color.

<cub>

Changes the underbar color.

<ch>

Forces hard characters.  Characters with this style cannot be delimiters.  See “Hard Characters Element” below.

 

Many of the above elements use similar attributes.  We will explain each below in groups.

Attributes (<b><i><x><rev><h><t><sw>)

Name

Values

Description

val

on | off

This overrides the corresponding text style property and turns the property either on or off.  By default, (or if this attributes is omitted) the property is turned on.

Attributes (<s>)

Name

Values

Description

val

super | sub | off

This overrides the script text style property and either turns off any scripting, or turns on superscript or subscript.  By default, (or if this attributes is omitted) superscript is turned on.

Attributes (<u><o><ob><ub>)

Name

Values

Description

val

single | dash | dot | double | wave | off

This overrides the corresponding text style property and sets the line style accordingly.  By default, (or if this attributes is omitted) the single line style is used.

Attributes (<c>)

Name

Values

Description

val

small | all | up | off

This overrides the caps text style property and sets the style accordingly.  By default, (or if this attributes is omitted) small caps is used.  The up style will show the text as small caps, but index the text as all caps.

Attributes (<e>)

Name

Values

Description

val

outline | emboss | engrave | shadow | off

This overrides the effect text style property and sets the style accordingly.   (Required)

Attributes (<f>)

Name

Values

Description

fFace

String

Regular font face

fFaceSm

String

Font face used for a small format ETBU.

fFacePrint

String

Font face used for printing.

fFamily

decorative | default | modern | roman | script | swiss

Font family. (Not recommended).

fPitch

default | fixed | proportional | variable

Font pitch. (Not recommended).

fQuality

antialiased | cleartype | default | draft | non-antialised | proof

Output quality of the font for the text.

Attributes (<lex>)

Name

Values

Description

st

The name of a <LEX> record in the sifx.

This changes the currently active lexicon.  Any words in this element will be indexed into that lexicon.

Attributes (<sz>)

Name

Values

Description

val

Either a measurement value or a percentage.

This is a typical measurement value or a percentage.  If this is a percentage then the size of the text is calculated dynamically by a percentage of the current window size.  This is useful for writing title pages.

Attributes (<tt>)

Name

Values

Description

st

The code defined in a <TAG> record in the sifx. Single Character

This changes the currently active tag type.

Attributes (<cf><cb><co><cu><cob><cub>)

Name

Values

Description

val

A color value

This overrides the corresponding color property.

Other Elements

There are several other elements that do not fit into any of the classes.

Reference Level Elements <R/><Re/>

Reference level elements <R/> are used to define a hierarchical structure to the file.  This structure is used as an address in hyperlinks to properly position when a hyperlink is taken.  Up to eight different hierarchy “trees” can be defined in a single document.  These “trees” can be overlapping.  For instance, you may want a section/sub-section hierarchy, or a book/chapter/verse hierarchy, or a page/paragraph hierarchy all in the same document.  There is no way in XML to define these multiple overlapping hierarchies.  Therefore, we have implemented these elements as empty elements.  When a reader encounters one of these elements they must record the level and tree and keep this reference active until the next element for the same tree is found.

 

Attributes <R/>

Name

Values

Description

ref

A reference definition in the following format:

l,d[,t]:name

 

Where: l= Code for a <LVL> record.

Where: d= Number of the tree depth.

Where: t= The tree number (1-8).

Where: name= The name of the level.

This is the definition of this reference code.  For instance, if we are defining a reference level for the title page of a document in the first reference tree we could define it as such:

ref=”S,1:Title Page”

The first paragraph in the title page could be:

ref=”P,2:1”

This assumes that <LVL> records were defined in the sifx that have the codes ‘S’ and ‘P’.  Note that the level number for the title page is 1, while the level number of the paragraph is 2.  This means that the paragraph is below (or part of) the title page in the hierarchy.  Notice also that the tree number has been omitted.  The default tree is always the first tree.  Any text after the colon is the name of the level.  This name may be omitted, however, if it is, then no text will be displayed in the table of contents for this reference level.  (Required).

abrv

String

An abbreviation for the name of the level.

hide

yes | no

Usually the user can choose which reference codes to show or hide.  If the author never wishes a code to be displayed, then this attribute may be included.

attr

A reference attribute definition in the following format:

a:name[;a:name]*

Where: a= Code for a <ATTR> record.

Where: name= The name of the attribute.

Each reference level can be given author defined attributes.  These attributes consist of a type (defined in the sifx) and a name.  For instance, you may want to give a “Topic” to a section:

attr=”T:Budget”

You may give more then one attribute:

attr=”T:Budget;S:President”

This assumes that <ATTR> records were defined in the sifx that have the codes ‘T’ and ‘S’ for possibly “Topic” and “Speaker” respectively.  These categories are completely author defined and can be used to help limit or bound searches.

 

The <Re/> element is used to terminate the last level for a given tree.  By default, once another reference element is encountered for a given level and tree, and other elements currently active for that tree which have a level greater than or equal to than the new element are automatically terminated.  You may use this element to terminate a level manually.

Attributes <Re/>

Name

Values

Description

tree

A tree number (1-8)

This is the tree that should terminate the last reference level.  Be default (if this attribute is omitted) the first tree is used.

Hyperlinks <H></H>

There are several different types of hyperlinks:

1)       Cross reference hyperlinks.  These are used to jump from one section of text to another.  The destination may also be in a completely different document.

2)       Image hyperlinks.  These are used to display a picture or image in a separate window.

3)       Shell hyperlinks.  These are used to spawn a separate application, start an email, or open a web browser.

4)       DDE hyperlinks.  These are used to control a second application.

The type of hyperlink can be determined from a code in the st attribute.  Other attributes are used to define other properties of the hyperlink.  Depending on the style of the hyperlink (either phrase or icon as defined in the <HLS> record in the sifx) the hyperlink is either an empty element (icon) or not (phrase).  If the hyperlink is not an empty element, clicking on any text in the hyperlink element will execute the hyperlink.

Attributes <H/>

Name

Values

Description

st

The style of the hyperlink in the following format:

t:name

 

Where: t= X | I | S | D

Where: name= The name of an <HLS> record.

The type codes are defined as follows:

X= A cross-reference hyperlink

I= An image hyperlink

S= A shell hyperlink

D= A DDE hyperlink

(Required).

file

A path to a file, usually relative to the location of the current document.  The following macros may be used to specify additional paths:

%TEXT% - The current document path.

%PROGRAM% - The WordCruncher program path.

This is used to specify an external file in:

·   a cross-reference hyperlink (Optional),

·   an image file in an image hyperlink (Required),

·   a file, web page, e-mail address, etc. for a shell hyperlink (Required).

 

It is not used for DDE hyperlinks – (st=”X|I|S”).

fileAux

One or more paths to files using the same rules as the file attribute.  Files are specified using the following format:

file[;file]*

Auxiliary files used to further define which book from a library should be used as the target.  If the desired target is a Book Set, then this attribute can contain the other files in the set.  The software will attempt to match each auxiliary file to other files in the book set.

 

This is only used in cross-reference hyperlinks (Optional) - (st=”X”).

cit

A forward slash (/) delimited string of reference names optionally prefaced by a tree number and optionally terminated by a word offset or a reference gap number:

[t:]/name[/name]*[ (:word[,subword [,tagword]]) | (#gap)]

This is the destination reference hierarchy path to position to.  Examples might be:

cit=”/Section 1/Sub-Section 5”

cit=”2:/Page 1/Paragraph 3”

cit=”/Section 1/Sub-Section 5:3”

 

This is only used for cross-reference hyperlinks - (st=”X”).

citRng

One or more citation ranges delimited by semicolons:

cit[-cit][;cit[-cit]]*

Where cit is the same as in the cit attribute, without the optional tree number.  Also, the cit can be a relative citation based on the citation in the cit attribute.

This is used to emphasize a range of words or references when a hyperlink is taken.  Each citation can be a full citation as defined in the cit attribute (without the optional tree number).  Or it can be a relative citation based on the citation given in the cit attribute.  If it is a relative citation, the cit should NOT be prefaced by a forward slash, and can optionally have one or more “..” levels which will remove one child level from the base citation.  For instance:

 

cit=”/Section/1” citRng=”1-2”

will emphasize /Section/1 through /Section/2

 

cit=”/Section/3/7” citRng=”6-../4/9”

will emphasize /Section/3/6 through /Section/4/9.

 

This is only used for cross-reference hyperlinks - (st=”X”).

idx

Number

Some image formats can contain multiple images.  This is the index of the image in the file.  The default is the first image.

 

This is only used with image hyperlinks - (st=”I”).

page

Number

Some images formats can contain multi-page images.  This is used to specify a particular page.

 

This is only used with image hyperlinks - (st=”I”).

rect

This is a rectangle measurement value:

m:(x,y,w,h)

Where m=t|p|i|c

t->twips

p->points

i->inches

c->centimeters

Where x,y,w,h= Real numbers.

These correspond to the left, top, width, height dimensions of the source image.

This is used to crop the output of an image.

 

It is only used with image hyperlinks - (st=”I”).

op

String

This is an OLE verb such as “open” or “print”.

 

It is only used with shell hyperlinks- (st=”S”).

cmd

String

This is a user defined command string.

 

It is used with shell hyperlinks and DDE hyperlinks - (st=”S|D”).

Path

A path.  The same macros that were used for the file attribute can be used here.

This is a path to use.

 

It is only used with shell hyperlinks - (st=”S”).

Inline Images <I/>

Inline images are usually empty elements.  If no file attribute is given for an inline image, then the text inside the element must consist of a base64 encoded image.  No other text is allowed inside an inline image hyperlink.

Attributes <I/>

Name

Values

Description

file

A path to a file, usually relative to the location of the current document.  The following macros may be used to specify additional paths:

%TEXT% - The current document path.

%PROGRAM% - The WordCruncher program path.

This is used to specify an external image file in an inline image (Optional).  If this is omitted for an inline image then the text of the element must be a base64 encoded image.

idx

Number

Some image formats can contain multiple images.  This is the index of the image in the file.  The default is the first image.

page

Number

Some images formats can contain multi-page images.  This is used to specify a particular page.

dim

The display dimensions of the image in a paired measurement value format:

m:(x,y)[,(x,y)]

Where m=t|p|i|c

t->twips           p->points

i->inches         c->centimeters

Where x,y= Real numbers.

The second optional numbers are for small format ETBU.

This is the logical output dimensions of the image.  (Required)

desc

String

Image description used in the ETGU.

rect

This is a rectangle measurement value:

m:(x,y,w,h)

Where m=t|p|i|c

t->twips           p->points

i->inches         c->centimeters

Where x,y,w,h= Real numbers.

These correspond to the left, top, width, height dimensions of the source image.

This is used to crop the output of an image.

Ruby Text Elements <rt/><rte/>

These elements are used to place small comments or notes above or below the main text in the document.  It is commonly used to place furigana text in Japanese documents.  The ruby text is positioned relative to any text that is between the begin element (<rt/>) and the end element (<rte/>).  Both of these must be empty elements.

Attributes

Name

Values

Description

val

String

This is the ruby text to use.  (Required).

idx

yes | no

If the ruby text is included within one logical word, the base characters can be replaced with the ruby text character and the new word indexed as well.

disp

top | bottom | off

The location to place the text.

just

center | left | right

The justification of the ruby text in relation to the base characters.

sz

Percentage

Size of the ruby text based on a percentage of the base font.  A comma separated second value may be included for use in small format ETBUs.

pos

Measurement value

Ruby text will be automatically placed above or below the text.  If additional space is desired, this value can be used to raise or lower the ruby text.

st

String[63]

A text style upon which to base the ruby text.

lex

String[63]

A lexicon to place any indexed words.

 

The Hard Characters Element <ch></ch>

This element can be used to change the default behavior of the word parse in the WordCruncher Indexer program.  The word parser does a very good job at finding the appropriate boundary between words.  However, there may be times that the Indexer selects a word boundary that is not optimal for a particular situation.  Any text inside a <ch> element (including whitespace) will be part of the current word and will not delimit the word.  Please note that a word may still be terminated by the end of a paragraph, change in lexicon, or any other markup (<tab/> for instance) that would otherwise place a physical break in the word.  Likewise, the word may not be automatically terminated at the end of the element if it is not immediately followed by a delimiter.

Phrase Group Elements <g/><ge/> <gx/><gxe/>

This element is used to create phrasal groups of words which are displayed on the word wheel.  Since formatting elements can be used within these groups, and the groups cannot be split up, these are implemented as empty elements.  When the reader encounters a group beginning element <g/> this group must remain active until the corresponding group ending element <ge/> is found.  The exclusion elements (<gx/> and <gxe/>) work similarly, except these are used to exclude words from the middle of a phrase.

Attributes

Name

Values

Description

idx

Number between 1 and 32767

This is the index of the group.  This is used to match up <g/>-<ge/> and <gx/>-<gxe/> elements.  If this is omitted, the index defaults to zero.  This is useful if nested groupings are desired.