|
Simple translations with CostJoe English 1 IntroductionCost is a powerful but somewhat complex system. The Simple module provides a simplified, high-level interface for developing translation specifications. 2 Getting startedA large number of SGML translation tasks involve nothing more than
The Simple module is designed to handle these types of translations. It makes a single pass through the document, inserting text and optionally calling a user-specified script at the beginning and end of each element. The translated document is written to standard output. To load this module, put the command require Simple.tclat the beginning of the specification script. Next, define a translation specification as follows: specification translate { specification-rules... } The specification-rules is a paired list matching queries with parameter lists. The queries are used to select element nodes, and are typically of the form {element GI}or {elements "GI GI..."}where each GI is the generic identifier or element type name of the elements to select. Any Cost query may be used, including complex rules like {element TITLE in SECTION withattval SECURITY RESTRICTED}or simple ones like {el}The latter query -- el -- matches all element nodes; it can be used to specify default parameters for elements which don't match any earlier query. The parameter lists are also paired lists, matching parameters to values. The Simple module translation process uses the following parameters:
Tcl variable, backslash, and command substitution are performed on the before, after, prefix, and suffix parameters. This takes place when the element is processed, not when the specification is defined. The value of these parameters are not passed through the cdataFilter command before being output. NOTE -- Remember to ``protect'' all Tcl special characters by prefixing them with a backslash if they are to appear in the output. The special characters are: dollar signs $, square brackets [], and backslashes \. See the Tcl documentation on the subst command for more details.
The cdataFilter parameter is the name of a filter procedure. This is a one-argument Tcl command. Cost passes each chunk of character data to this procedure, and outputs whatever the procedure returns. The initial value of cdataFilter is the identity command, which simply returns its input: proc identity {text} {return $text} The sdataFilter parameter works just like cdataFilter, except that it is used for system data (the replacement text of SDATA entity references.) The initial sdataFilter is also identity. The cdataFilter and sdataFilter parameters are inherited by subelements; that is, if they are not specified for a particular element then the currently active filter procedure will be used by default. 3 Other utilitiesThe translateContent procedure works just like the Cost built-in command content, except that the content of CDATA and SDATA nodes are filtered through the current cdataFilter and sdataFilter, respectively. 4 ExampleThe following specification translates a subset of HTML to nroff -man macros. (Well, actually it doesn't do anything useful, it's just to give an idea of the syntax.) require Simple.tcl specification translate { {element H1} { prefix "\n.SH " suffix "\n" cdataFilter uppercase } {element H2} { prefix "\n.SS " suffix "\n" } {elements "H3 H4 H5 H6"} { prefix "\n.SS" suffix "\n" startAction { # nroff -man only has two heading levels puts stderr "Mapping [query gi] to second-level heading" } } {element DT} { prefix "\n.IP \"" suffix "\"\n" } {element PRE} { prefix "\n.nf\n" suffix "\n.fi\n" } {elements "EM I"} { prefix "\\fI" suffix "\\fP" } {elements "STRONG B"} { prefix "\\fB" suffix "\\fP" } {element HEAD} { cdataFilter nullFilter } {element BODY} { cdataFilter nroffEscape } } proc nullFilter {text} { return "" } proc nroffEscape {text} { # change backslashes to '\e' regsub -all {\\} $text {\\e} output return $output } proc uppercase {text} { return [nroffEscape [string toupper $text]] } 5 NotesThe specification order is important: queries are tested in the order specified, so more specific queries must appear before more general ones. Parameters are evaluated independently of one another. For example, specification translate { {element "TITLE"} { cdataFilter uppercase } {element TITLE in SECT in SECT in SECT} { prefix "<H3>" suffix "</H3>\n" } {element TITLE in SECT in SECT} { prefix "<H2>" suffix "</H2>\n" } {element TITLE in SECT} { prefix "<H1>" suffix "</H1>\n" startAction { puts $tocfile [content] } } } The parameter cdataFilter uppercase applies to all TITLE elements, regardless of where they occur, and the startAction parameter applies to any TITLEs which are children of a SECT, even if an earlier matching rule specified a prefix or suffix. As its name implies, the Simple module is not very sophisticated, but it should be enough to get you started. |