T.R | Title | User | Personal Name | Date | Lines |
---|
4510.1 | | HERON::KAISER | | Thu Mar 28 1996 02:49 | 16 |
| > Does anyone have the ability to extract off one of the Internet Server
> ALL the Internal URLs?
As soon as Alta Vista is cloned for internal use.
> We have the Easynotes_Conference to list all the available
> Notesfiles/Conferences.
EASYNOTES_CONFERENCE doesn't list all available conferences, only the ones
that people have thought to announce there. There are many conferences not
mentioned there. But a good web spider will find all interlinked web
pages. It'll still be possible to set up an isolated island of web pages,
but as soon as someone outside the island links to the island ... wham!
they're indexable.
___Pete
|
4510.2 | | VANGA::KERRELL | salva res est | Thu Mar 28 1996 04:01 | 6 |
| re.1:
There's an internal search engine off Digital's internal home page - does this
not use a web spider to build the index?
Dave.
|
4510.3 | | CIM::LOREN | Loren Konkus | Thu Mar 28 1996 05:56 | 4 |
| I find that the AIT Announcement Server is pretty useful for finding
internal stuff. See:
http://www-ad.mso.dec.com/announce/pa-toc.html
|
4510.4 | Digital's Internal World-Wide Web index (DWI) | LGP30::FLEISCHER | without vision the people perish (DTN 227-3978, TAY1) | Thu Mar 28 1996 06:30 | 20 |
| re Note 4510.2 by VANGA::KERRELL:
> There's an internal search engine off Digital's internal home page - does this
> not use a web spider to build the index?
I think you're thinking of the Digital Web Indexer,
http://src-www.pa.dec.com/cgi-bin/dwi, which uses some of the
same technology as Alta Vista.
However, the Digital Web Indexer does not use a spider but
relies on distributed gatherer programs to send information
to the index. (This is similar to the Harvest architecture,
and similar to the enterprise catalog server recently
announced by Netscape.)
Since it depends upon gather programs outside of the direct
control of the maintainers of the index, coverage is
inconsistent.
Bob
|
4510.5 | personal spider / indexer ??? | SAYER::ELMORE | through the looking glass | Thu Mar 28 1996 17:14 | 18 |
| I'd like a slight "spider" variation. I would like to find a program
that starts at a given WEB page, or, optionally, takes your own
hotlinks/bookmarks, and traverses, then indexes every linked page from
there.
Ideally you could specify "how many levels deep" to go.
I've seen [somewhere] some software that wakes up to look at
hotlinks/bookmarks/history URLs to see if pages have been recently
updated. That's close, but I'm looking for an indexer too. My
bookmarks are already basically what I need, but I can never remember
what bookmark contains what piece of information...therefore my
[personal] need for the [personal] index.
I'm sure I could write a spider script of sorts that follows URLs
around, but not the indexer.
--Steve
|
4510.6 | Intranet Alta Vista Trial Offer! | LJSRV2::POWELL | | Tue Apr 02 1996 11:33 | 8 |
| You may have noticed that AltaVista is now under test internally.
Try URL: altavista.pa.dec.com/ and see what happens!
I just noticed this entry this week, but don't know how long the test
will run. Looks like we're really going to make Alta Vista a product.
Good luck!
|
4510.7 | yellow pages idea great | SALES::ICS::DIRICO | | Fri May 10 1996 14:56 | 13 |
| All of the inconsistent search stuff aside, since this web stuff took
off quickly and now is quite large to pull in and
control/maintain/organize...
I love the idea of a yellow pages of intranet URLs. I think as the web
becomes a more vital way to communicate within the company as
notesfiles/public directories/email decrease...the yellow pages is a
key first step to build from.
My first thoughts are that someone from Corporate Communications
publish this but then again, maybe not. Any other thoughts?
Mary Beth
|
4510.8 | | QUARK::LIONEL | Free advice is worth every cent | Fri May 10 1996 15:03 | 5 |
| Re: .7
See .3
Steve
|
4510.9 | | TENNIS::KAM | Kam WWSE 714/261.4133 DTN/535.4133 IVO | Fri May 10 1996 15:06 | 5 |
| I'd like to see a yellow pages cuz I can't do a search if I don't know
what phrase to supply the search engine. I saw some URL's posted in a
Notesfiles. I went to Altavista.pa.dec.com and searched for the
information and it didn't find it. Therefore, I'm missing some
valuable information.
|
4510.10 | not quite mission-critical yet | LGP30::FLEISCHER | without vision the people perish (DTN 227-3978, TAY1) | Fri May 10 1996 16:10 | 10 |
| re Note 4510.9 by TENNIS::KAM:
> I went to Altavista.pa.dec.com and searched for the
> information and it didn't find it.
I don't believe that this is maintained as a production
system, and thus may not always be available, may not be
updated very often (or at all), etc.
Bob
|
4510.11 | | QUARK::LIONEL | Free advice is worth every cent | Fri May 10 1996 17:04 | 3 |
| Kam, have you TRIED the AIT Announcement Server?
Steve
|
4510.12 | | TENNIS::KAM | Kam WWSE 714/261.4133 DTN/535.4133 IVO | Fri May 10 1996 17:46 | 6 |
| I'm looking for a Digital ONLY Yellow Pages. This Company has so much
information that I don't want it cluttered with information outside
this company.
Regards,
|
4510.13 | | plugh.ibg.ljo.dec.com::needle | Money talks. Mine says "Good-Bye!" | Fri May 10 1996 18:20 | 6 |
| The information at altavista.pa.dec.com is in beta test. It's not maintained
and is not a public service yet. When it does become public, it would be
reasonable to expect a service of the quality of altavista.digital.com for
the intranet.
j.
|
4510.14 | exactly what would you like to see? | LGP30::FLEISCHER | without vision the people perish (DTN 227-3978, TAY1) | Fri May 10 1996 19:55 | 68 |
| re Note 4510.0 by tennis.ivo.dec.com::KAM:
> We have the Easynotes_Conference to list all the available
> Notesfiles/Conferences. Can we create some mechanism to keep track of
> all the internal URLs?
What that mechanism might be depends upon what you mean by
"all the internal URLs".
Note that the Easynotes_Conference conference does not list
all of the internal topics and replies, it just lists the
conferences.
We do have separate services that actually search the content
of most of the conferences (e.g., Comet at
http://encke.alf.dec.com/cgi/v4.2 ).
You use the former (Easynotes_Conferences) when looking for
an appropriate conference. It identifies conference by
overall topic.
You use the latter (Comet) when looking for specific notes a
very specific subject, regardless of the conference
containing them.
I suspect that with the Web we need both kinds of service.
The nature of the Web makes the analogue of the former, an
index of topical or thematic collections of pages, a little
harder to define than does DEC Notes. However, it probably
should be an index of home pages (or what we called "front
pages", as in the first page of a magazine or book) with a
little description of the overall topic or theme of the
service to which that page represents the entry. This is
what the Announcement Directory set out to be.
The latter is simply AltaVista -- an index of all web pages
(not password or otherwise protected) (note that the Comet
URL listed above also provides an index of most Digital web
pages).
> If what I am looking for is not available, I would like to create
> something that will list all the available internally URLs, whether
> their personnel, private, or public URLs.
So the question remains: do you want to index every page as
an entry in this list, or do you want to list every
*collection* of related pages (recognizing that some
significant "collections" may only be one page)?
The former is a bit easier to do -- it can be done
automatically, which is what AltaVista does.
The latter is harder because it requires, for now, human
intelligence to select the things to be registered, either in
the form of a central staff, or through conventions followed
by all who publish on the internal network (e.g., registering
your own collections).
If you'd like to do the latter, and implement a more robust
version of the Announcement Directory, I'd be glad to see you
do it and I'd offer any help I can. There's a product in
there, I'm sure (a number of similar products have been
announced). But hurry -- we in the group of which I am a
part are likely to get our notices this coming week.
Bob
[email protected]
|
4510.15 | | QUARK::LIONEL | Free advice is worth every cent | Fri May 10 1996 22:10 | 5 |
| Re: .12
Ok, so now I KNOW you haven't looked at it.
Steve
|
4510.16 | http://www-ad.mso.dec.com/announce/pa-toc.html | LGP30::FLEISCHER | without vision the people perish (DTN 227-3978, TAY1) | Sat May 11 1996 08:45 | 28 |
| follow-on to Note 4510.14:
One obvious comparison to the Announcement Directory is the
Yahoo service. I hesitate to make this comparison because
Yahoo has full time people, essentially librarians working
in cyberspace, who carefully construct and maintain a rich
classification hierarchy.
The Announcement Directory has no staff to do this, so the
only classifications provided are those that can be
automatically determined, e.g., whether a URL is owned by
Digital or not, and whether it is external to Digital or on
Digital's Intranet. We can also do obvious sorts, such as by
date and title.
(There are opportunities for the application of advanced
natural language processing techniques here.)
It was hoped that the Digital community would provide
informal maintenance of the entries (anybody can add, and
actually anybody can delete and replace an entry). To some
extent this happens, but it is far from being as
well-maintained as Yahoo. (Nobody has it in their job
description to maintain it.) On the other hand, its content
is probably as well-maintained as Easynet_Conferences.
Bob
[email protected]
|
4510.17 | some references | LGP30::FLEISCHER | without vision the people perish (DTN 227-3978, TAY1) | Tue May 14 1996 12:49 | 16 |
| More follow-on:
Two articles have recently appeared on the Web that address
aspects of this subject. One is:
http://www.cio.com/WebMaster/0596_field.html
-- "Finding the Way", by former DECie Tim Horgan
Another is:
http://gnn.com/wr/96/05/10/webarch/index.html
-- "Revenge of the Librarians"
Bob
|