[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference gyro::internet_toolss

Title:Internet Tools
Notice:Report ALL NETSCAPE Problems directly to [email protected].rnet? Read note 448.L for beginner information.
Moderator:teco.mro.dec.com::tecotoo.mro.dec.com::mayer
Created:Fri Jun 25 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:4714
Total number of notes:40609

3934.0. "Web Crawler/Spider/Search/AltaVist questions/comments" by VAXCPU::michaud (Jeff Michaud - ObjectBroker) Mon Aug 05 1996 20:34

T.RTitleUserPersonal
Name
DateLines
3934.1RANGER::WASSERJohn A. WasserTue Aug 06 1996 13:3411
3934.2VAXCPU::michaudJeff Michaud - ObjectBrokerTue Aug 06 1996 13:4921
3934.3RANGER::WASSERJohn A. WasserWed Aug 07 1996 10:5420
3934.4Typos -- Corrected in current buildNETRIX::"[email protected]"Chris LordWed Aug 07 1996 18:128
3934.5VAXCPU::michaudJeff Michaud - ObjectBrokerWed Aug 07 1996 20:2852
3934.6additional notesTUXEDO::ROSENBAUMRich RosenbaumFri Aug 09 1996 00:3114
3934.7MR1MI1::VILCANSFri Aug 09 1996 14:189
3934.8Warning: Crawler on the Intranet is following links w/query arguments!VAXCPU::michaudJeff Michaud - ObjectBrokerMon Mar 24 1997 16:3027
>     - at some point the crawler will have the (optional) ability to 
>       crawl link URLs that contain "?" query arguments.  This is necessary 
>       to support Lotus Domino servers which use query syntax for 
>       perfectly ordinary pages (well not _completely_ ordinary, they are
>       dynamically generated from Lotus databases).

	Looks like this support is now there.  Someones trying to index
	my site, including the generated url's that contain query
	arguments.  It's going to take them probably 9 million hits
	(or more to get all the combinations of authors and notesfiles
	and personal names) to complete.

	The user agent that I'm getting hit by is:

User-Agent: AltaVista Intranet V1.0 sbu_antony [email protected]

	and I've sent mail to them letting them know that they better
	have alot of disk space (and time, at the rate they are going
	I compute it's going to take them 5+ years to finish :-).

	Can't the crawler only following links with query arguments if
	the server *is* a "Lotus Domino" server?

	I'd add support for .txt files to my http server so I can support
	a robots.txt, but it's too late now as it appears the crawler
	only tries to fetch robots.txt once, no matter how many links or
	how much time has past ...
3934.9teco.mro.dec.com::tecotoo.mro.dec.com::mayerDanny MayerTue Mar 25 1997 08:498
>        I'd add support for .txt files to my http server so I can support
>        a robots.txt, but it's too late now as it appears the crawler
>        only tries to fetch robots.txt once, no matter how many links or
>        how much time has past ...

	How about cutting the connection so that it has to start again?

		Danny