[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference vaxuum::document_ft

Title:	DOCUMENT T1.0
Notice:	New notesfile (DOCUMENT.NOTE) now available (see note 897)
Moderator:	CLOSET::ADLER

Created:	Mon Feb 09 1987
Last Modified:	Thu Oct 31 1991
Last Successful Update:	Fri Jun 06 1997
Number of topics:	897
Total number of notes:	4397

372.0. "How to extract formatting tags for translation" by MUNSBE::RUEDEL (Wilfried R�del, Munich, DTN: 757-0226) Wed May 13 1987 13:36

    My question is relevant for on-line translation:
    
    Does anyone know a procedure to extract all tags from a DOCUMENT
    source file and put them into another file?
    
    The 'empty' format file consisting of tags only wil then be used
    as a starting point for a translator to type in the translation.
    The idea is to preserve the format of a text while translating it.
    
    The translator will use two windows on his screen:
    
    - one to READ the source text (e.g. in English)
    - the other with the extracted tags to type in his translation  
      in the target language (e.g. French)

T.R	Title	User	Personal Name	Date	Lines
372.1		AUTHOR::WELLCOME	Steve	`Fri May 15 1987 10:38`	1
	TECO could probably do it; do you know any TECO wizards?
372.2	�Who is fluent in TPU?	MUNSBE::RUEDEL	Wilfried R�del, Munich, 757-0226	`Fri May 15 1987 14:31`	2
	No, but I would have thought that there is a nice TPU procedure around somwhere. So, if someone knows of something like that...
372.3	Possibility	BUNSUP::LITTLE	Todd Little NJCD SWS 323-4475	`Fri May 15 1987 15:37`	6
	In the note on DECspell dictionaries in this conference I posted a trivial SCAN program that will locate all tags in a file and extract just the tags and place them in another file. It loses all the arguments, etc. so I'm not sure how helpful it would be for your translation needs. -tl
372.4	translation tools	VAXUUM::KOHLBRENNER		`Mon May 18 1987 10:24`	28
	Extracting tags from a file is a non-trivial task. Tags have arguments and the arguments can contain other tags. Sometimes the arguments contain text that would need to be translated, other times the arguments contain keywords that probably should not be translated. The <comment> tag and the <literal> tags come in two formats, and they may or may not contain other tags in the text that they contain. Is this text, with or without its tags, intended for translation? Separating the tags from the text seems to me only half the problem, if there is an intention to later merge the tags back into the translated text. How will the merge be accomplished? What are the "markers" that tell how to put the two pieces back together? Won't some tags disappear as part of the translation? The writer of English may want to add emphasis to a word or phrase, using the <emphasis> tag. The translator may find it easy to convey the emphasis in the target language without bolding or italics. Computerized aids for doing the translation sounds like a worthwhile project, but it seems that it will require more than a simple TPU procedure to be very useful... bill
372.5	separation of syntax and semantics (form and functions?)	ATLAST::BOUKNIGHT	Everything has an outline	`Mon May 18 1987 11:26`	8
	What it should really have access to is the exact same parser used in GUTENTAG. With the work to be done in adopting SGML, maybe the DOCUMENT folks could consider breaking the front end up into separate syntax and sematics handling code, making the syntax handing code available to other users such as translation aids, a "pretty" formatting program for SGML, etc. jack