[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference vaxuum::document_ft

Title:	DOCUMENT T1.0
Notice:	New notesfile (DOCUMENT.NOTE) now available (see note 897)
Moderator:	CLOSET::ADLER

Created:	Mon Feb 09 1987
Last Modified:	Thu Oct 31 1991
Last Successful Update:	Fri Jun 06 1997
Number of topics:	897
Total number of notes:	4397

675.0. "How do I automate the index process?" by NCADC1::PEREZ (The sensitivity of a dung beetle.) Wed Jul 15 1987 17:58

    I just came back from ?running? an informal Document seminar.  It
    was well received and I look forward [sic] to having our CPU absolutely
    buried by happy people documenting.
    
    HOWEVER, I told the folks that to do an index you had to manually put
    the <X> tag on each entry you wanted in the index, every time you
    wanted it referenced.  For example, if you wanted the word "HORSESHOE"
    indexed and referenced it on 4, 103, 157, 322, you would have to put an
    entry at each of those points in the source.  If a page break occurred
    the reference would be incorrect in the index. 
    
    I D*** near got LYNCHED!!!  The general comment was "No wonder our
    documents have such lousy indexes, if it takes that much work."
    References were made to unnamed primitive products that allow the
    user to put in a list of references to be indexed that are then
    permuted and correctly correlated with the sources. 
    
    Am I wrong?  Does Document do this (hope, hope)?  Is there some
    automatic way to get the references to index elements correct for all
    occurences in a document?  Or, are we doomed to having 20 lines
    of index with only the initial reference for an 8000 line document? 

    Dave P

T.R	Title	User	Personal Name	Date	Lines
675.1	Modification of <X> required/wished/called for?	IJSAPL::KLERK	Theo de Klerk	`Wed Jul 15 1987 18:35`	8
	One of those primitive products is Runoff that will accept >string to enter "string" in the index. Perhaps as a wish the <X>(text) should leave the "text" in the paragraph as well as entering it in the index. It sounds like a not too difficult solution to implement, though highly incompatible with previously made documents that will then suddenly see two "text"s in the final page output Theo
675.2	I agree and disagree	COOKIE::JOHNSTON		`Wed Jul 15 1987 18:56`	36
	RE: .0 I couldn't let this one go by without the comment that it takes more than an automatic feature to make a truly useful index. You must consider more than just where words appear on a given page. You must also think about user concepts and the many different ways in which a user will try to look up information. Take something simple like the print command. Users will look in the index for any or all of the following leaders (or something similar to them): Commands Command qualifiers Output PRINT command Printing Qualifiers This is a really small example that doesn't even consider logical queue names, destinations, offsets, whatever. You can't anticipate every option; and even if you could, you have to make conscious decisions about what to include or not because resources for paper, people, time, ad nauseum are always scarce. I don't disagree with having more automatic features, but I think that only artificial intelligence can come close to writing as good an index as a human can. And I think that noone should be allowed to write an index before having training in it; a poor index only makes a bad book worse. The good news: a good index can go a long ways towards improving the usability of a poorly organized book. Indexes are being mandated corporate-wide; I don't envy the task of anyone who has to develop the tools or the training. Rose
675.3	Nevertheless...	IJSAPL::KLERK	Theo de Klerk	`Thu Jul 16 1987 02:55`	94
	Considering the .-1 remark, and realizing than processing the same file twice, the <X>(text) might get expanded repeatedly into e.g. text<X>(text) --> texttext<X>(text). So, it might be better to use the Runoff approach and use something unique (like >> or @@) to prefix the word that needs to go into the index. A six minute Scan hack follows. The out-commented part would expand the <X> tag, the currently active part uses the >> approach. I agree that I will do but don't like repeating myself by making <X> inserts which are identical to the word just written. <X>(have fun) Theo -------------------------------------------index.scn--------------------- MODULE index_insert; !++ ! ! Author: Theo de Klerk (IJSAPL::KLERK) ! ! Creation Date: 87 06 16 ! ! Modified: ! ! Description: ! ! This program converts strings in an SDML file ! <x>(text) --> text<x>(text) ! <x>(text1 <subentry> text2) --> text1<x>(text1 <subentry> text2) ! >>text --> text<X>(text) !-- SET valid_chars ('!'..'~'); TOKEN index_token CASELESS {'<X>'}; TOKEN arg_start {'('}; TOKEN arg_end {')'}; !...Replace >> by whatever you like as a flag for indexing TOKEN add_index {'>>' valid_chars...}; !++ ! ! Replace With ! ! <X>(text <subentry> text2) text<X>(text<subentry>text2) ! !-- !... *** DEACTIVATED ** ! MACRO replace_index TRIGGER ! {index_token arg_start argument : FIND(')') arg_end}; ! ! DECLARE n : INTEGER; ! ! n = INDEX( UPPER(argument), '<SUBENTRY>' ); ! IF n <> 0 ! THEN ANSWER argument[1..n], '<X>(',argument,') '; ! ELSE ANSWER argument, '<X>(',argument,')'; ! END IF; ! END MACRO; !++ ! ! Replace With ! ! >>text text<X>(text) ! !-- MACRO add_index_macro TRIGGER {text : add_index }; ANSWER text[3..], '<X>(',text[3..],')'; END MACRO; PROCEDURE main MAIN; DECLARE file_name : STRING; READ PROMPT ('Enter Document file name (include type): ') file_name; START SCAN INPUT FILE file_name OUTPUT FILE file_name; END PROCEDURE /* main /; END MODULE / INDEX_INSERT */;
675.4	No help today, but ...	VAXUUM::DEVRIES	M.D. -- your Device Doctor	`Thu Jul 16 1987 10:53`	27
	Somebody could write a search-and-replace kind of hack to let you search a file (or list of files) for a given word or phrase and then (1) duplicate that as the argument to <x>; (2) let you enter an argument or tag of your choosing; or (3) leave it alone. Such a thing could allow a "global search-and-replace" mode. In conjunction with a global mode, it could also have a "search-and-delete" mode, so you could visit all the inserted tags and remove them where they weren't appropriate. (And I hope the search algorithm would treat all space elements (space, tab, endline, maybe "\") as possible equivalents.) The ultimate solution, of course, is a combination of training and more creative tools. The pressure to direct resources to this kind of development must come from management, as the mindset shifts from "it's the last thing you gotta throw in" to "it's as important as what you say in the book" to (eventually?) "it's something you plan up-front, like the outline (contents) of the book". (And kudos to those who realize its importance today.) This change in outlook will come, and it won't be far away, because the index and table of contents are likely to be the primary ways of traversing online documentation. The flip-till-you-see-it method of information retrieval will not be acceptable with online documentation, at least not until new techniques and technologies are found. --Mark
675.5	To be a TeXnician or a TeXwizard or a TeXlunie...	IJSAPL::KLERK	Theo de Klerk	`Thu Jul 16 1987 12:14`	6
	Come to think of it, the DECTeX equivalent to <X> could be a macro that does both text + \indexentry{text} (or whatever it is - I did not look) If you know that, you could enter it in your local elements and have it solved... Theo
675.6	Intelligence, artificial or otherwise - we need it	NCADC1::PEREZ	The sensitivity of a dung beetle.	`Mon Jul 20 1987 22:07`	15
	This has been great, but I think my original question got lost. All I want is a simple way to have then indexing find all the references to a particular element and put them in the index. For example, as in .2 with the print -- once I've told whatever to find "printing" I would like to have the appropriate index and subindex entries set up with the page numbers found on throughout the document. Currently, about all that happens is a quick and dirty TPU run through the document indexing the <head and <subhead entries. I don't know about the rest of you, but I can't get anybody to rummage a 100 page document looking for likely index elements, much less a 3" thick design document. Dave P
675.7		AUTHOR::WELLCOME	Steve	`Tue Jul 21 1987 09:11`	17
	Re: .6 > Currently, about all that happens is a quick and dirty TPU run through > the document indexing the <head and <subhead entries. I don't know > about the rest of you, but I can't get anybody to rummage a 100 > page document looking for likely index elements, much less a 3" > thick design document. I, for one, DO go through manuals looking for index elements. It can get tedious, but I think it's the only way to get a really good index. A good index is hard to write! A good index involves a heck of a lot more than putting in <head> and <subhead> entries, or a bunch of keywords. One has to enter the mind of the likely user of the manual and imagine what he is going to want to look up and how he's likely to try to do it. All too many of the indexes this company puts out are egregiously bad. An index needs to be WRITTEN, just the way any chapter or appendix in the manual needs to be written.
675.8	Automated indexing? No thank you!	TLE::SAVAGE	Neil, @Spit Brook	`Tue Jul 21 1987 09:25`	6
	I agree with Steve: An automated index feature would encourage careless indexing. I would never use it or encourage any writer to use such a feature to index a book. The solution is to index it as you write it; putting off the indexing until the end (as a afterthought) is bad practice!
675.9	BETTER indexing, however it is to be done!	GLINKA::GREENE		`Tue Jul 21 1987 10:54`	28
	Our CUSTOMERS (remember them?) find that our documentation (for both hardware and software) is one of our real problem areas. I refer here to customers who use our documentation manuals, not customers who will be using DOCUMENT -- although they certainly will have some of the same concerns, for both their own use and their readers. "Poor indexes" [or indices -- what's proper these days?] is the MOST common complaint about DEC documentation. I am not knowledgeable enough about the various options (past, present, or future) for creating an index to "vote." But it seems essential that whatever tools can be created to facilitate better indexing is essential. If customers can't FIND what they need in the documentation, then it doesn't really matter if it is clearly written -- or even there at all! As for the concern about how many ways one could try to look up "print," that IS a serious problem. As the customers have complained (and no doubt many of us have experienced), "you have to know the correct term already before you can find it in the index..." DOCUMENT is great! Let's not forget the use of the end product. Penelope
675.10		AUTHOR::WELLCOME	Steve	`Tue Jul 21 1987 11:01`	7
	Re: .8 I guess we all work differently; I generally save the index until the end, because then I know exactly what has to go into it. I find if I try to do it as I go along I have to rewrite and reorganize so much it ends up being more work. The important thing is, if one does do the index at the end, DON'T do it as an afterthought! Leave plenty of time, whenever you do it.
675.11	And that includes the ones from "tech writers".	NCADC1::PEREZ	The sensitivity of a dung beetle.	`Wed Jul 22 1987 20:45`	19
	I don't wanna start a holy war over whether or not you should do the index first, last, in the middle, etc. "Quite frankly, I don't give a damn. <SET FLAME - FACE REALITY> However, I agree with .8 or .9 (whichever) -- our indexes are poor -- which is exactly what I said in the original note. Its fine to tell a bunch of programmers to "leave plenty of time to do the index" but SINCE THE WHOLE MANUAL IS CONSIDERED AN AFTERTHOUGHT AND WASTE OF TIME its gonna be tough. Without some automated way to do at least a mediocre index, there won't be ANY INDEX. I haven't seen a project go out of here in two years with indexes on ANY OF THE DOCUMENTS (except for the "poor" ones I did from the heads and subheads). The only reason there's a Table of Contents is because its automated! D
675.12	Clean one's own house first.	BUNSUP::LITTLE	Todd Little, NYA SWS, 323-4475	`Sun Jul 26 1987 15:00`	15
	re: .11 The following phrase that recently has meant much to me, is somewhat applicable: "Poor planning on your part doesn't necessarily constitute an emergency on my part." A better was to say what I'm getting at is; Don't complain about the tools and talk about religion when the problem is within your own realm to solve, and in the case of bad planning, should be solved. -tl An X-DCC who feels that 1 in 100 project managers/leaders knows what the word "plan" means.
675.13	I don't make the news, I just report it.	NCADC1::PEREZ	The sensitivity of a dung beetle.	`Tue Jul 28 1987 15:25`	13
	RE .12 > A better was to say what I'm getting at is; Don't complain about > the tools and talk about religion when the problem is within your > own realm to solve, and in the case of bad planning, should be solved. I wasn't complaining about DOCUMENT. I like DOCUMENT. I was simply relaying a concern expressed by member(s) of the Sales Support staff. I'd like INDEX to permute again, but other than that, I figure the developers will do whats best. D