RTF, or Rich Text Format, is Microsoft's interchange format
for word processing files.
RATFINK is a set of Tcl routines for creating RTF files,
and a Cost interface for converting SGML to RTF.
RTF is also the basis for Windows Help (WINHELP) format.
RATFINK does not currently contain any direct support for WINHELP.
rtflib.tcl contains the low-level utility routines.
This file is not Cost-specific and may be used in any Tcl script.
RTF.spec contains the extra Cost commands for SGML conversion.
Lexically, RTF files are a simple stream of
text, control words, and groups,
All text is seven-bit ASCII.
Control words are lowercase alphabetic tokens beginning with a backslash,
followed by an optional integer parameter, and terminated by a space
or a non-alphanumeric character.
Groups are nested data enclosed in curly braces.
Semantically, things start to get complicated.
RTF files start with a header, which contains a
font table, color table, stylesheet, and other metainformation.
The header is followed by the document data.
Everything in the document is a paragraph,
except for the stuff that isn't.
The stuff that isn't a paragraph includes
section text, which contains paragraphs and tables;
table rows, which contain cells, which contain paragraphs;
field instructions (see 3.10. ``Fields''),
which can appear inside paragraphs; and
other stuff.
Actually it's all very messy and you shouldn't have to worry
about it too much except to note that
every block of displayed data is
considered a paragraph, and that
RTF has a ``flat'' structure:
sections, paragraphs, and table rows do not nest.
NOTE -- Groups on the other hand do nest, and group boundaries
can cross section, paragraph, row, and cell boundaries, but don't
worry about that either.
Formatting is determined by stylesheet entries.
There are three types of stylesheet entries in RTF:
character styles, paragraph styles, and section styles.
These are defined with the commands
rtf:charStyle, rtf:paraStyle, and rtf:sectStyle,
respectively
All three commands have the same syntax:
rtf:xxxStyle id "description" [ -basedonstyleid ] {
paramvalueparamvalue
...
}
Each stylesheet entry has a symbolic id
by which it is referenced in other commands,
and a textual description,
which is inserted into the RTF file as the style name.
TIP -- Some RTF style names are interpreted specially
by various word processors;
see 2.6. ``Special style names'', below.
If -basedon is specified,
then all of the style properties listed in the
named styleid are copied to the style being defined.
Parameters in the new style override those in the
base style. styleid must be the id
of a previously-defined style of the same type
(paragraph, character, or section).
Style properties are specified as a list
of param-value pairs.
Parameters map (more or less) onto RTF control words.
Parameter values take one of several forms,
depending on the parameter.
Boolean parameters specify true/false values;
the value must be 1 or 0.
Flag parameters are like booleans, but the
parameter can only be turned on (flag parameters correspond
to properties which are off by default and turned on by the
presence of an RTF control word.)
Dimension parameters specify a length
as a number followed by a unit name,
e.g., 12pt, 0.5in; see 2.1.1. ``Dimensions'' for more details.
Other parameters may be integers, a list of enumerated values,
or some other type as described below.
Most RTF control words expect lengths to be specified in twips;
others expect them to be specified in half-points.
There are 20 twips to a point, and
there are 72 points to an inch.
NOTE -- There are several definitions of a ``point'';
RTF uses the conventional DTP definition of exactly 72 points to the inch.
Since twips and half-points are not a very convenient way to specify lengths,
RATFINK allows you to specify dimensions in terms of other units
and converts to twips or half-points as appropriate.
Dimension specifications are decimal numbers,
with an optional minus sign and fractional part,
followed immediately by one of the following units:
in
inches
pt
points (1 point = 72 inches)
pc
pica
picas (6 picas = 1 inch)
twip
1/20th point
mm
millimeters (but see note)
cm
centimers (but see note)
NOTE -- Due to roundoff errors in converting millimeters to twips,
the metric units are unreliable. Also, different versions of Word
seem to use different rounding conventions.
It's best to avoid cm and mm if possible.
Every paragraph style may include its own set of tab stops.
This is specified by the TabStopsparagraph style property.
Tab stops are specified as a list of tabspecs;
each tabspec is a list consisting of a dimension
followed by any of the following property specifications:
Align
Specifies how the text following the tab is to be aligned with
respect to this tab stop. One of:
Left,
Right,
Center, or
Decimal.
Leaders
One of:
Dot
Leader dots or periods
Thick
Leader thick line
Underscore
Leader underline
Equal
Leader equal sign (=)
Hyphen
Leader hyphens (-)
Tab stops may also be defined with the rtf:tabStops command:
rtf:tabStopsname { tabspec ... }
Defines and assigns to name a set of tab stops.
name may be referenced by a TabStops parameter
in subsequent paragraph style definitions.
RATFINK predefines the rule styles
thin, thick, and double.
Additional rule and border styles may be defined with the
rtf:ruleStyle command.
Like rtf:tabStops, this defines a symbolic name for a particular
rule style that is referenced as the value of other stylesheet parameters.
rtf:ruleStylename {
paramvalue
...
}
Defines a rule style named name. Allowable parameters are:
Style
One of the symbols Normal, Thick, Double,
Shadow, Dash, Dot, or Hairline.
Thick specifies a double-thickness border; Normal is the default.
Thickness
Dimension: thickness of the rule
Margin
Dimension: space to leave between the rule and the text
to which the rule is attached.
NOTE -- It is unclear what
the \brdrhair (Style Hairline)
control word means, since the thickness of the rule is actually determined by
the \brdrw (Thickness) control word.
RATFINK makes sure that all the control words are output in the
correct order.
One of the symbolic font names defined in the font table;
e.g., roman, sans, or mono.
See 2.1.2. ``Fonts''.
FontSize
Dimension: height of the font. If no units are specified,
defaults to half-points. Does not include leading.
Bold
Boolean: if set, use bold font variant
Italic
Boolean: if set, use italic font variant.
AllCaps
Boolean: if set, folds all text to uppercase
SmallCaps
Boolean: if set, folds lowercase letters to small caps.
Hidden
Boolean: hidden text.
Underline
One of:
0
None
No underline
1
Single
Single (continuous) underline
Double
Double underline
Dot
Dotted underline
Word
Underline words only (single underline)
NOTE -- The ``shadow'' and ``outline'' character formatting properties
have been omitted on aesthetic grounds.
Considering how Word typically renders ``small caps,''
that should probably be avoided as well.
NOTE -- Apparently it is not possible in RTF to specify double word underline.
Defines a paragraph style, which may be referenced by id
in a later call to rtf:startPara.
Available parameters are:
LeftIndent
Dimension: Left margin, relative to page margin
RightIndent
Dimension: Right margin, relative to page margin
FirstIndent
Dimension: Indentation of first line, relative to LeftIndent.
Quadding
One of the symbols
Left
Flush left, ragged right
Right
Flush right, ragged left
Center
Each line centered
Justify
Both margins aligned
SpaceBefore
Dimension: vertical whitespace inserted before the paragraph
SpaceAfter
Dimension: vertical whitespace inserted after the paragraph
LineSpacing
Dimension: minimum height of each line (including leading).
If omitted or set to zero, the font height is used.
TabStops
Reference to a set of tab stops previously defined with rtf:tabStops
(see 2.1.3. ``Tab stops'').
Hyphenate
Boolean: enable (1) or disable (0) hyphenation for this paragraph.
PageBreakBefore
Force a page break before this paragraph
KeepTogether
Flag: disallow page breaks within this paragraph
KeepWithNext
Flag: disallow page break between this paragraph and the next one.
TopBorder
BottomBorder
LeftBorder
RightBorder
Argument is a rule style as described in 2.1.4. ``Rules and borders''.
Specifies that a rule is to be drawn above, below, to the left, or
to the right of the paragraph.
InnerBorders
Flag: border specifications apply to individual paragraphs.
In addition, all of the character formatting style attributes
(see 2.2. ``Character styles'')
may be specified for a paragraph style.
TIP -- To achieve ``hanging indentation'', use a negative value
for FirstIndent.
NOTE -- If a series of successive paragraphs specify the same set of borders,
the borders are drawn around the group as a whole unless the InnerBorders
flag is specified.
Flag: if set, page orientation is landscape instead of portrait
for this section.
NOTE -- It is unclear how this affects the interpretation
of the page size and margin control words.
HasTitlePage
Flag: if set, the first page of this section may use a
different header and footer than the rest of the section.
See 3.6. ``Page headers and footers''.
HeaderPosition
Dimension: Distance from the top of the page to the page header.
FooterPosition
Dimension: Distance from the botom of the page to the page footer.
NOTE -- It is unclear whether these specify distance to the
top or the bottom of the header and footer.
SectionBreak
One of the following symbols:
None
No explicit page break before this section
Page
Section starts on a new page
EvenPage
Section starts on the next even-numbered page
OddPage
Section starts on the next odd-numbered page
VAlign
Specifies the vertical alignment of the text with
respect to the page margins. One of the following
symbolic values:
Top
Ragged-bottom pages, aligned with the top margin (default).
Justify
Text is set flush-bottom.
Middle
Text is centered between the top and bottom margins.
Bottom
Text is aligned with the bottom margin.
PageNumbering
Determines the format for the PageNumber special control word
(see 3.3. ``Special characters'').
One of Arabic, UCRoman, LCRoman,
UCAlpha, or LCAlpha,
for arabic (decimal), uppercase roman numerals, lowercase roman numerals,
uppercase letters, and lowercase letters, respectively.
RestartPageNumbers
Boolean: if set, page numbering starts over at this section.
FirstPageNumber
Integer: starting page number for this section
if RestartPageNumbers is set (1).
The rtf:documentFormat command
specifies formatting properties for the document as a whole.
This command is optional; the default values for these
parameters should be sufficient unless you need finer control
over the layout.
rtf:documentFormat {
paramvalue
...
}
Available document formatting parameters are:
PaperWidth
PaperHeight
Dimension: specify the width and height of the paper.
Default depends on the output device;
typically US letter (8.5in by 11in).
PaperSize
Shorthand way of specifying PaperWidth and PaperHeight.
One of the symbolic values
A4, A5, B5, Letter, Legal, or Executive.
LeftMargin
RightMargin
TopMargin
BottomMargin
Dimension: specifies the left, right, top, and bottom page margins.
May be overridden on a per-section basis.
Flag: if set, swaps the values of LeftMargin and RightMargin
on verso (even-numbered) pages.
MirrorMargins is only valid if TwoSide is set.
NOTE -- The RTF spec says only that this control word
``switches margin definitions on left and right pages,''
which is ambiguous.
By experimentation,
LeftMargin corresponds to the ``inner'' margin
and RightMargin corresponds to the ``outer'' margin,
at least in Word for Windows 95 Version 7
Landscape
Flag: if set, the entire document is set with landscape
orientation.
NOTE -- It is unclear how the top, bottom, left, and right margins
are interpreted if landscape orientation is specified.
Protection
Enables ``document protection'' for programs which support this feature.
Allowable values are:
AllProtected
Document may not be modified.
Annotations
Document may be annotated but not modified.
Revisions
Document may be modified, but revision tracking is enabled.
Hyphenate
Boolean: enables automatic hyphenation for this document.
HyphenationHotZone
Dimension: specifies the ``hyphenation hot zone,''
the distance from the right margin in which words may be hyphenated.
HyphenationLadderCount
Integer: maximum allowable number of consecutive hyphenated lines.
HyphenateAllCaps
Boolean: allow hyphenation for words consisting of all capital letters
if set.
Word uses the paragraph style names
Heading 1, Heading 2, etc.,
as the source text for building a table of contents.
Use these names as the description
for heading entries in the text to facilitate automatic TOC generation.
Word applies the paragraph styles
TOC 1, TOC 2, etc.,
to entries in automatically-generated tables of contents.
If you include definitions for these styles in the stylesheet,
Word will use them to format the table of contents.
Call rtf:start after all declarations and before
writing any output. Call rtf:end at the end of processing.
rtf:start
Begins the top-level RTF group,
emits the style sheet and other header information,
and
sets any document-wide formatting properties
specified by rtf:documentFormat.
rtf:end
Closes the top-level RTF group.
Must be called at the end of processing.
The basic unit of text in RTF is the paragraph.
In RTF, a paragraph is any block of displayed text --
including section headings, list items, and table cell entries --
not necessarily a conventional paragraph.
rtf:startPara and rtf:endPara delimit the
start and end of paragraphs.
styleid is the name of a paragraph style defined with
rtf:paraStyle.
Since paragraphs do not nest, rtf:endPara is optional.
rtf:startPhrasestyleid
# ...
rtf:endPhrase
Use rtf:startPhrase and rtf:endPhrase to apply
special formatting to text within a paragraph.
styleid is the name of a character style defined with
rtf:charStyle.
rtf:endPhrase is not optional.
Phrase boundaries must not cross paragraph boundaries.
(Actually RTF doesn't care if they do, but this confuses RATFINK).
RTF documents may optionally be broken into sections.
rtf:startSectionstyleidrtf:endSection
styleid is a section style declared with rtf:startSection.
Since sections do not nest in RTF, rtf:endSection is optional.
Writes text to the output file,
escaping backslashes and braces so they are not interpreted
as RTF markup.
rtf:text makes sure that the output is inside
a paragraph.
If not, it starts a new paragraph
and issues a warning.
rtf:text also replaces sequences of
two consecutive hyphens with an en-dash,
three hyphens with an em-dash,
two backquotes (`) with a left double quote,
and two apostrophes (') with a right double quote.
rtf:insertdata
Inserts data into the current paragraph verbatim,
leaving backslashes and braces as-is.
rtf:write "data"
rtf:write inserts data into the output verbatim.
data may contain RTF control codes.
NOTE -- Be very careful when using rtf:write to
generate RTF commands directly.
These commands generate a ``hard'' tab, line break, page break,
and column break control word, respectively.
rtf:tab and rtf:lineBreak may only be used inside a paragraph.
RTF destination groups are used for text that does not appear
in the main flow; e.g., page headers or footnotes.
The rtf:divert command starts a new destination group.
rtf:divertdestination
# generate data for destination...
rtf:undivert
Header and footer text is specified in
destination groups.
There are several different destinations related to headers
and footers; which ones are applicable depend on various
document and section style properties.
Header
Footer
LeftHeader
LeftFooter
RightHeader
RightFooter
FirstPageHeader
FirstPageFooter
The Header and Footer destination groups
specify the default header and footer, respectively.
LeftHeader, LeftFooter, RightHeader and RightFooter
specify the header and footer for left (verso) and right (recto)
pages; these are only applicable if the TwoSidedocument formatting property is set.
FirstPageHeader and FirstPageFooter specify
the header and footer for the first page of the section;
these are only applicable if the HasTitlePagesection formatting property is
specified for the section.
Headers and footers should be specified immediately
after the call to rtf:startSection.
If a particular applicable header or footer is not specified,
then it is inherited from the previous section.
Headers and footers contain ordinary paragraph text.
The PageNumberspecial character
may be useful in headers and footers.
rtf:special FootnoteNumber
rtf:divert Footnote
# generate footnote text ...
rtf:undivert
Footnotes are ``anchored'' to the character that
immediately precedes the destination group.
Use the FootnoteNumberspecial character
to obtain automatically-numbered footnotes.
The following document-wide formatting properties
affect how footnotes are formatted; they may be specified
with the rtf:documentFormat command prior to the start of output.
FootnoteNumbering
Format for automatic footnote numbers.
One of Arabic, UCRoman, LCRoman,
UCAlpha, or LCAlpha,
for arabic (decimal), uppercase roman numerals, lowercase roman numberals,
uppercase letters, and lowercase letters, respectively.
FootnoteRestart
One of the following:
AtPage
Footnote numbers restart at the beginning of each page.
Continuous
Footnotes are numbered continuously.
AtSection
Footnote numbers restart at the beginning of each section.
FootnoteLocation
Specifies that footnotes appear other than at the end of the page.
Possible values:
EndOfSection
Footnotes appear at the end of the section.
EndOfDocument
Footnotes appear at the end of the document.
If FootnoteLocation is not specified, footnotes appear ath
the bottom of each page.
FootnotePlacement
Specifies placement of footnotes on the page.
One of:
PageBottom
Footnotes are placed at the bottom of the page.
BeneathText
Footnotes appear directly beneath the text.
NOTE -- RTF also has ``alternate'' footnotes, used to
put both footnotes and endnotes in the document.
RATFINK does not support alternate footnotes.
RTF allows sections of text to be defined as a bookmark.
It is not clear from the RTF specification what this feature does;
presumably it is used by word processing software.
rtf:startBookmarknamertf:endBookmarkname
The bookmark name may be any character data.
rtf:startBookmark must be followed by a matching
rtf:endBookmark; bookmarks may overlap however.
NOTE -- Table support is still in beta. This interface
has not been very well tested or debugged, and is subject to change.
In RTF, a table is a consecutive series of rows,
each of which contains a series of cells.
Cells contain either a series of one or more paragraphs
or inline text.
RTF has no explicit control words
for the beginning and end of a table.
Instead, tables are specified as a sequence of rows.
Cell properties (sizes and rules) are specified
all at once at the beginning of each row,
followed by the cells themselves.
RATFINK makes the following simplifying assumptions:
Horizontal and vertical rules always extend all the way across the table.
except where they cross spanning cells.
rtf:startTable begins a table.
Exactly one of -numcols, -relwidths, or -abswidths
must be specified to define the number of columns in the table.
-numcolsn
Specifies that the table has n equal-width columns.
-abswidths "w1 w2 ... wn"
Specifies that the table has n columns,
with the specified widths, where each w is
a dimension specification.
-relwidths "w1 w2 ... wn"
Specifies that the table has n columns.
The width specifiers w1 through wn
are integers.
Each w is interpreted as a relative width
and the total width of the table is proportionally divided among the
columns.
The other options are:
-widthdimension
Specifies the total width of the table.
Ignored if -abswidths is specified.
Default: the width of the page.
(Actually, the default is only a guess at the page width,
since RATFINK does not currently keep track of this...)
-framerulestyle
-rowseprulestyle
-colseprulestyle
Specifies the default outer borders,
default horizontal rules between rows,
and default vertical rules between columns.
rulestyle is the name of a rule style
previously defined with rtf:ruleStyle
(Note: -frame only works for the
top, left, and right table borders;
the bottom border must be re-specified on the final row.)
-align (left|right|center)
Specifies the alignment of the table as a whole relative to the page.
specifies that this row contains only m cells;
cell i spans si columns of the table.
The sum of s1 through sm must equal the total
number of table columns.
-toprulerulestyle
Specifies the rule above this row.
Only legal for the first row in the table.
Default: the table -frame option if this is the first row,
the bottom rule of the preceding row otherwise.
-botrulerulestyle
Specifies the rule below this row.
Default: the -rowsep rule style for the table.
-colseprulestyle
Specifies the style of all vertical rules between cells for this row.
Default: the table -colsep value.
-heightdimension
Specifies the minimum height of this row.
Default: The height of the enclosed text.
rtf:endRow and rtf:endCell are optional.
rtf:startCell [ paraStyle ]
...
rtf:endCell
Begins a new cell. Cells can contain inline text, or
a series of paragraphs.
rtf:endCell is optional. It marks the end of the current cell.
A field in RTF is a hook for specifying program-specific
commands to the word processor reading the RTF file.
Fields contain two parts: a field instruction,
which is the actual command;
and an optional field result, which holds the results of processing
the field. (The field result may be used to provide
default text in case the application does not understand
how to process the field instruction.)
Inserts a field instruction.
instruction is any character data; backslashes and other
special characters will be escaped before writing to the output.
The second parameter is optional; if supplied it will be used
as the text of the field result.
rtf:startField "field instruction"
# ... generate field result ...
rtf:endField
Like rtf:insertField, except the field result
may contain arbitrary RTF text instead of a simple character string.
NOTE -- Many field instructions have optional parameters that
are specified with sequences beginning with a backslash.
Note that these are not RTF control words
(except for \fldalt, but that's too horrifying to get into...).
The available field instructions vary from application to application;
check the documentation for the program in question.
The commands described in the previous section and defined
in rtflib.tcl are all general-purpose Tcl utilities for
creating RTF, and may be used independently of Cost or SGML.
The file RTF.spec is a high-level Cost script
to assist in converting SGML to RTF.
rtf:convertspecname
To specify the processing for a particular DTD,
define a Cost specification supplying
RATFINK processing parameters for each element type,
then call the rtf:convert command with the name of
your specification.
(This is in addition to
defining a stylesheet and document formatting properties as described
above.)
NOTE -- A Cost specification maps document nodes to parameters
based on queries; see the Cost reference manual for full details.
For example:
specification rtfSpec {
{element P} {
rtf para
paraStyle body
}
{elements "DFN EM"} {
rtf phrase
charStyle hp0
}
{elements "UL OL"} {
rtf #IMPLIED
}
{element LI} {
rtf para
paraStyle litem
}
{element LI in UL} {
prefix {$rtfSpecial(Bullet)$rtfSpecial(Tab)}
}
{element LI in OL} {
prefix {[childNumber].$rtfSpecial(Tab)}
}
{element PRE} {
rtf linespecific
paraStyle verbatim
}
{element H1} { rtf para paraStyle heading1 }
{element H2} { rtf para paraStyle heading2 }
... etc.
}
rtf:convert rtfSpec
TIP -- rtf:convert is reentrant and may be
called recursively, possibly with a different specification,
for complex processing.
There is one mandatory parameter for every element: rtf.
This specifies one of the following ``architectural forms'':
para
A paragraph or other displayed block.
The required parameter paraStyle specifies the id of the paragraph
style to use for this element.
phrase
Inline text with special character formatting.
The required parameter charStyle specifies the id of the character
style to use for this element.
section
A section. The sectStyle parameter, ditto.
linespecific
A displayed block in which record-ends (newlines) are significant.
special
Special-purpose processing.
Use the parameters startAction and endAction
to specify how to process this element.
#IMPLIED
No special processing for this element.
The optional parameters startAction and endAction
are valid for every element.
They specify Tcl code to execute at the start and end of the element,
respectively.
The code is evaluated at global scope.
Tcl variable- and command- replacement is performed on the
charStyle, paraStyle, and sectStyle parameters.
NOTE -- The RTF translation routine prints a warning if there is no
rtf parameter specified for an element.
The following parameters may be specified for any element node,
and may be used to specify automatically-generated text:
before
Data to insert before processing the element
prefix
Data to insert at the beginning of the element,
after the start-of-element processing.
suffix
Data to insert at the end of the element,
before the end-of-element processing.
after
Data to insert after processing the element.
Tcl variable- and command- replacement is performed on these
parameters with the subst command.
The result is inserted directly into the output file and
may contain RTF control words.
You should use the rtf:Escape command
if the value might contain data that looks like an RTF control instruction;
for example,
Paragraphs do not nest in RTF, but they do in many SGML applications.
For example, it is often legal to include
bulleted lists,
code examples,
equations,
and other displayed material in the middle of a paragraph.
For para-form elements,
the optional continuedStyle parameter names a paragraph style
for subsequent blocks of text that are part of a logical
paragraph in the SGML document but are treated as separate
paragraphs in RTF.
Normally, record-ends are converted to spaces.
If rtf linespecific is specified for an element,
then record-ends are processed as hard line breaks.
The element is formatted as a single paragraph,
and the paraStyle parameter also applies.
If rtf section is specified for an element, RATFINK
starts a new section with rtf:startSection at the
beginning of the element.
(It does not call rtf:endSection at the end of the element,
since in general sections may nest in an SGML document
while they may not in RTF; keep this in mind.)
The startAction and endAction parameters are
also evaluated for section-form elements.
These parameters
contain arbitrary Tcl code, evaluated at top-level (global) scope.
(startAction may be used to generate headers and footers,
for example.)
Note that this DTD is not actually used by RATFINK; it is for
descriptive purposes only.
Conceptually, the mapping from source document elements onto
architectural forms is determined by the rtf parameter,
which specifies the result element type; other parameters correspond
to result attributes. The %headings; architectural forms
do not corrsepond to source elements; they are instead generated
by the application.
There is currently no way to set formatting properties
for a section, paragraph, or phrase without defining a stylesheet entry.
Handling nested lists and other such things is more difficult
than it ought to be.
RATFINK does not always output control words in the order
prescribed by the RTF syntax productions
(but neither does Microsoft Word, for what it's worth...)
There is no support for pictures, drawing objects, embedded
objects, or other features.
NOTE -- I'd really like to support of bitmapped images,
but the RTF spec is extremely unhelpful on this point.
Does not handle context-sensitive style information very well.
For example, if the DTD allows bulleted lists inside regular
paragraphs and inside notes, and the desired formatting is to
set regular paragraphs in a roman font and notes in a
sans-serif font, then there must be distinct RTF paragraph styles
for lists inside notes and lists inside regular paragraphs.
If a style overrides a parameter in its base style,
the corresponding control word will be emitted more than once.
Since the later setting takes precedence, this usually makes no difference,
but it means that ``flag'' control words cannot be turned off
if they are turned on in a base style, and that tab stops
in a base paragraph style may not be cleared.
I've done my best to make sure that this library only generates
legal RTF (as far as my understanding of the specification goes),
but it is still possible in certain obscure circumstances
for the output to crash Word.
RTF has control words to embed table of contents entries,
and index entries; however, there are no control words
to build a table of contents or index.
Consequently, I haven't bothered to support these features
in RATFINK.
NOTE -- With Word for Windows 95 Version 7 you can do these things with
field instructions, so they may be useful after all.
RTF supports automatic numbering of lists and headings,
but not very well. For example, if you include
a paragraph inside a list item Word resets the counter
for the next item; and if you have two numbered
lists in a row with no intervening paragraphs there is no
way to restart the list numbers at 1.
Consequently, I haven't bothered to support these features either.
Many SGML applications assume an application convention
whereby multiple spaces are equivalent to a single space.
Many text formatting utilities (TeX, n/troff, Scribe, etc.)
work this way, but RTF does not: all spaces are significant.
RATFINK does not do anything to compress multiple spaces
by default; however you can do some tricks with short reference
maps to take care of this in the parser.
The output of RATFINK has been extensively tested with
Microsoft Word for Windows 95 Version 7, and to some
extent with Microsoft Word for Macintosh Version 5.1a.
I have no idea how well it will work, if at all,
with other word processors; chances are good that there
will be differences in other applications' interpretations
of the specification.
It takes a lot of work to get any sort of decent typography out of Word.
Another tool for converting SGML to RTF
is JADE, James Clark's amazing DSSSL engine.
See http://www.jclark.com/ for details.
This program works under Win32 and most Unix variants.
NOTE -- The 1.4 RTF spec also includes a sample RTF reader program,
but it isn't very good; Paul Dubois' RTF tools are a better bet.
The Microsoft material is only supplied as a self-extracting DOS executable.
If you don't have a DOS system available,
you're not completely out of luck:
the Info-ZIP project UNZIP utility
runs on just about every system imaginable
and is able to unpack this format.
See ftp://ftp.uunet.net/pub/archiving/zip/
and elsewhere (ask Archie; it's widely mirrored).
You'll still need a copy of Microsoft Word to read
the RTF specification, though.
Information about WINHELP may be found in the
Windows Help Authoring Toolkit at
ftp://ftp.microsoft.com/Softlib/mslfiles/what6.exe,
and
the Usenet newsgroup comp.os.ms-windows.programmer.winhelp,
and its related FAQ.
WARNING -- Winhelp is not for the faint of heart or weak of stomach.
RTF is pretty messed up, but Winhelp is a complete abomination.