[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference thebay::joyoflex

Title:The Joy of Lex
Notice:A Notes File even your grammar could love
Moderator:THEBAY::SYSTEM
Created:Fri Feb 28 1986
Last Modified:Mon Jun 02 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1192
Total number of notes:42769

525.0. "Need <boring> word eliminator!" by PHDVAX::MURRAY () Wed Jun 01 1988 03:40

Fellow Logophiles,

I remember reading (and I don't know where) about a program that would strip
out "boring" words from a block of text, a "boring" word is one that does
not convey much in the way of non-contextual information.  Such a program
might perform the following transformation of the text above:

reading program strip boring block text boring convey non-contextual
information program perform following transformation
(end of transformation)

Basically, it gets rid of "a", "the", "of", and the other 500 or so like
words and like DECspell lets one add personal entries.

I need it to pull keyword candidates out of sentences and phrases.  No
lectures, please, on why such a program can't possibly be useful.  Any pointers
appreciated as long as they haven't been run through the boring word program!!!!
;-)  (-:

If I don't hear anything in a few weeks, I'll post my version here sometime
later.

Thanks,

Rich Murray
T.RTitleUserPersonal
Name
DateLines
525.1"Doesn't auger well, I fear" he said, giving it his awl.LAMHRA::WHORLOWI Came,I Saw,I concurredWed Jun 01 1988 03:5621
    G'day,
    
>    I remember reading (and I don't know where) about a program that would strip
>out "boring" words from a block of text, a "boring" word is one that does
>NOT convey much in the way of non-contextual information.  Such a program
>might perform the following transformation of the text above:
>
>reading program strip boring block text boring [NOT]convey non-contextual
>information program perform following transformation
>(end of transformation)
 
    It may be boring to the program, dropping the word "not" has altered
    the sense of the paragraph.
    
    Is this ok? Real interested in the WHYs of such a program.
    
    djw
    ps Reminds me of the Goodies and their scenes at the "World Boring
    Championships" :-)
    
    
525.2Russians?WRONGO::PARMENTERC&#039;est quoi?Thu Jun 02 1988 17:435
    
    Is the program designed to mimic Russians, who don't have articles
    in their language, or even a present tense "to be?"
    
    David
525.3It's amazing how many logophiles can't read.SMURF::BINDERA complicated and secret quotidian existenceWed Jun 08 1988 19:4718
Re: .1, .2

The explanation in .0 was very explicit as to the reasons for such a 
program.  It can serve to extract candidates for linguistic keywords 
from any arbitrary text.  Such extraction should prove of great utility 
to someone who is developing a parser to deal with natural-language 
entry from a na�ve computer user.  Such a parser exists in a primitive
form in the text adventure games marketed by Infocom.  In its speed and
compactness, this parser is amazingly intelligent, handling such
constructions as these: 

	> Pirate, please tell the parrot to attack the ugly fish with a 
	gaff.

	> Take the small rusty knife out of the red box and then cut the
	rope with the knife. 

- Dick
525.4In defence, Your Honour, I wish to take the stand...LAMHRA::WHORLOWI Came,I Saw,I concurredThu Jun 09 1988 03:1117
    G'day,
    
>       It may be boring to the program, dropping the word "not" has altered
>   the sense of the paragraph.
    
>   Is this ok? Real interested in the WHYs of such a program.
    
    Now I _can_ read  - quite well as a matter of fact. Incidently, I don't
    believe the question about negation was answered. Ok, I asked for the
    WHYs which you felt were already explained, but which I appparently did
    not.. Now you have explained, indirectly - thank you. The requirement
    was to extract keywords for input to a parser for further analysis
    without the need to wade through words of little interest. Its use
    might be in games, or in natural language projects. Very useful
    for those in that field, I'm sure.
    
    Derek