[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference gyro::internet_toolss

Title:Internet Tools
Notice:Report ALL NETSCAPE Problems directly to [email protected].rnet? Read note 448.L for beginner information.
Moderator:teco.mro.dec.com::tecotoo.mro.dec.com::mayer
Created:Fri Jun 25 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:4714
Total number of notes:40609

4569.0. "The proxy cache efficiency discussion topic" by UTRUST::KUIJPER (Caught in a World-Wide-Web !) Tue Mar 25 1997 09:09

    Hi,
    
    I'm supporting a couple of firewalls at some large Dutch customers.
    The tpoic of interest is WWW caching.
    Some are based on the good old CERN proxy (as part of the AltaVista
    firewall, some run with Netscape 2.0/2.5 and some with Purveyor).
    
    I am looking for figures, recommendations etc. concerning caching in
    these proxy servers.
    
    My largest customer has a 10 Mb link to the Internet, about 300000
    requests per day for a total of about 2 Gb of traffic.
    They use 140 Netscape threads (version 2.5) on Digital UNIX with a 2 
    drive cache (RZ28) with a maximum size of 2 Gb. Only http is being cached.
    
    They have a cache effiency of about 10%.
    With such a figure, I wonder wether caching is actually usefull given
    the huge I/O load on the cache disks for lookups (each disk has about
    60 I/O's sustained per second, and I had to create the domain with a
    large preallocated metadata size to prevent fragmentation.
    
    I have done some analysis on the data, but can not find an obvious way
    to increase the efficiency.
    
    Questions:
    1) What is the general opinion on caching, that is at which point
       should one cache. At 5% efficiency, at 10%, at 40%. Depending
       on linespeed ?
    2) What are recommendations for the cache size ? If I increase it
       further, file lookups will increase and the disks might well
       become saturated.
    3) What steps can be taken to increase cache efficiency. I am
       thinking of:
    	- caching just .gif and .jpg files (they have the best hit ratio)
    	- no caching but just preloading the cache based on yesterday's
    	  access data (only usefull for the Netscape server)
    	- using different cache scenario's, such as the distributed
    	  SQUID/Harvest cache.
    4) What are normal figures for cache efficiency ?
    
    I know that there is/has been a study on caching, but could not find
    the report. I welcome all remarks, recommendations, thoughts etc.
    
    Thanks,
    Frank, NSIS The Netherlands
T.RTitleUserPersonal
Name
DateLines
4569.1teco.mro.dec.com::tecotoo.mro.dec.com::mayerDanny MayerTue Mar 25 1997 11:597
	I seem to remember that Jeff Mogul had done some analysis of caching
  on the proxy server in Palo Alto and that the general conclusion was that
  it wasn't really worth it, the variety of requests was just too broad to
  get much efficiency by using caching.  I believe that Palo Alto no longer
  caches pages in the proxy server.

		Danny
4569.2VAXCPU::michaudJeff Michaud - ObjectBrokerTue Mar 25 1997 12:2711
> 	I seem to remember that Jeff Mogul had done some analysis of caching
>   on the proxy server in Palo Alto and that the general conclusion was that
>   it wasn't really worth it, the variety of requests was just too broad to
>   get much efficiency by using caching.

	Did he also do that analysis on the proxy servers as well?  I
	remember reading in here that someone (Jeff?) had done an
	analysis on the requests coming into the AltaVista search
	engine and reached the conclusion that it was not worthwhile
	to cache search results for the same reasons (an AltaVista
	NOTES search could probably find that string of notes :-)
4569.3Re: The proxy cache efficiency discussion topicQUABBI::"[email protected]"Ong Beng HuiTue Mar 25 1997 23:3862
Hi,

>    I am looking for figures, recommendations etc. concerning caching in
>    these proxy servers.
>    
>    My largest customer has a 10 Mb link to the Internet, about 300000
>    requests per day for a total of about 2 Gb of traffic.
>    They use 140 Netscape threads (version 2.5) on Digital UNIX with a 2 
>    drive cache (RZ28) with a maximum size of 2 Gb. Only http is being cached.
 
I am have a customer (ISP) here with 2 x 4100, 2 x 8200 as proxy
cache. The total storage of the four machine is around 100G and they
are running Harvest from network appliance (www.netapp.com)
Request is around 1.5 millions per day per proxy. Their cache effiency
is around 40-50%.

There are two reasons for the deployment of proxy cache. content
filtering and caching. A single international E1 costs around a loaded
4100 per month. 50% bandwidth saving can easily be translated to 1 new
4100 every month. 

>    I have done some analysis on the data, but can not find an obvious way
>    to increase the efficiency.
>    
>    Questions:
>    1) What is the general opinion on caching, that is at which point
>       should one cache. At 5% efficiency, at 10%, at 40%. Depending
>       on linespeed ?

Sorry, I didn't get you on this...

>    2) What are recommendations for the cache size ? If I increase it
>       further, file lookups will increase and the disks might well
>       become saturated.

Squid and Harvest store it's index in memory. I am not sure if
Netscape does that. From experience with Squid and Harvest,
cache size is directly proportional to the amount of memory you got.

>    3) What steps can be taken to increase cache efficiency. I am
>       thinking of:
>    	- caching just .gif and .jpg files (they have the best hit ratio)
>    	- no caching but just preloading the cache based on yesterday's
>    	  access data (only usefull for the Netscape server)
>    	- using different cache scenario's, such as the distributed
>    	  SQUID/Harvest cache.

You might want to tune your cache expiry factor. Squid/Harvest are
pretty good proxy cache. 

>    4) What are normal figures for cache efficiency ?

I guess cache efficiency differ from individual situation.

>    I know that there is/has been a study on caching, but could not find
>    the report. I welcome all remarks, recommendations, thoughts etc.

Check out http://www.nlanr.net/Cache and follow the links.


[posted by Notes-News gateway]
4569.4Re: The proxy cache efficiency discussion topicQUABBI::"[email protected]"Jeffrey MogulWed Mar 26 1997 21:3856
Danny Meyer wrote:
|>   I seem to remember that Jeff Mogul had done some analysis of caching
|>   on the proxy server in Palo Alto and that the general conclusion was that
|>   it wasn't really worth it, the variety of requests was just too broad to
|>   get much efficiency by using caching.

Not quite right.  I think you are confusing several things that I
said, perhaps not all in this newsgroup:

In article <[email protected]_tools>,
	[email protected] (Jeff Michaud - ObjectBroker) writes:
|> 
|> 	Did he also do that analysis on the proxy servers as well?  I
|> 	remember reading in here that someone (Jeff?) had done an
|> 	analysis on the requests coming into the AltaVista search
|> 	engine and reached the conclusion that it was not worthwhile
|> 	to cache search results for the same reasons (an AltaVista
|> 	NOTES search could probably find that string of notes :-)

(1) Caching AltaVista responses (or other search-engine responses)
would not yield much benefit.  I simulated a perfect cache over
a 24-hour traces of AltaVista URLs from the middle of last year,
and it would probably get at most a 15% hit rate.  A more reasonably
sized cache would get closer to 10%, if my memory is right.  (Note
that AltaVista's internal cache holds query results, which is not
the same as a URL-keyed cache.)

(2) For many months, the Palo Alto proxies were run without any
caching, because the disk I/O requirements for handling such
large caches seemed to slow things down.  Apparently, the people
who run these proxies have now re-enabled caching.  I'm not sure
they have really solved the disk I/O delays, though.

(3) My analyses of other traces suggest that the best that a Web
cache can do is probably around 60%-70% hit rate.  This assumes
a very large cache, and that our traces are "representative" of
other pools of clients.  A 70% hit rate sounds good, but if (for
example) the average hit takes 1 second, and the average miss
takes 10 seconds, then the overall average retrieval time will
be almost four seconds; i.e., a 70% hit rate does not necessarily
correspond to a 70% improvement in perceived performance.  (The
situation is complicated by the possibility of reduced congestion.)
If the ratio of hit/miss costs is closer (which it seems to be,
in reality) then caching might be even less impressive to the
actual user.

One of the things I need to spend some time on (when I can
find some free time to spend!) is to figure out a way of
reliably benchmarking the response times provided by proxy
caches.  Looking at hit rates is not enough, because if the
cost of a "hit" is too high, it can actually make matters worse.

-Jeff

[posted by Notes-News gateway]
4569.5Re: The proxy cache efficiency discussion topicQUABBI::&quot;[email protected]&quot;Steve GlassmanWed Mar 26 1997 22:1818
Back when I set up the first caching proxy in Digital, I did a
bunch of measurement and wrote it up for the first WWW conference
in 1994.  You can read the paper:
  http://www.research.digital.com/SRC/personal/Steve_Glassman/CachingTheWeb/CachingTheWeb.html

The rough numbers I found (and seem to be supported in most other
studies) give a cache hit rate of roughly 33%.  If your companies really 
have good connectivity and a hit rate of only 10%, I would be tempted
to either turn off the cache or just make the cache smaller.  
Most of the cache hits come from a small fraction of the files.
Making the cache larger only very marginally improves the hit rate.

In my mind, the places that can justify caching are those with 
bad network connectivity and/or per-byte charges for network
access.

Steve
[posted by Notes-News gateway]
4569.6Information gathering, please providers papersUTRUST::KUIJPERCaught in a World-Wide-Web !Fri Mar 28 1997 16:489
    :RE .4
    
    Jeff,
    
    Can you point me to the information you have gathered on this subject
    (a technical report, a white-paper, a public submission ?)
    
    Thanks,
    Frank
4569.7Re: The proxy cache efficiency discussion topicQUABBI::&quot;[email protected]&quot;Steve GlassmanFri Mar 28 1997 20:3821
My number for a cache hit rate - 1/3 (33%), and Jeff Mogul's 
number - 70% are not as contradictory as they first seem.  A 
cache hit rate of /13 comes from a fairly lazy caching scheme
that simply caches the replies to actual requests and uses that 
result if another request for the same item comes along "soon 
enough".  My analysis showed a potential hit rate of about 2/3
if all of the items in the cache are "fresh enough".

This means that if the cache does automatic freshening
of cached pages it could get the hit rate up to 2/3.  Note,
that keeping cached pages fresh takes extra bandwidth from
the cache to the servers because some of the freshened pages
won't be requested from the cache.

So, if the goal is maximum browser-perceived hit rate on the 
cache then a 2/3 hit rate is possible.  If the goal is minimum
bandwidth between the proxy and servers, then the hit rate is
closer to 1/3.

Steve
[posted by Notes-News gateway]