[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:Welcome to the Digital UNIX Conference
Moderator:SMURF::DENHAM
Created:Thu Mar 16 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:10068
Total number of notes:35879

9184.0. "Compression using pipe - what causes variations?" by MELEE::GERACE (Cindy Gerace @297-3884) Fri Mar 14 1997 09:26

    We are running V3.2c on an Alpha 8400 (huge machine - 8 cpus, 8gig of
    memory).  We have an Oracle database and export the data to a file
    using a pipe to compress it.  Yesterday the export file was 9gig, today
    it is 12gig.  The compression rate yesterday was 39%, today it was 20%. 
    Does anyone know what would cause the compression rate to vary from day
    to day?  We've had it as high as 42% and as low as 5%.  Since it really
    affects the size of the export file, we'd like to get as much
    compression as possible.
    
    Thanks,
    
    - Cindy
T.RTitleUserPersonal
Name
DateLines
9184.1Compression depends on what you're compressing.WTFN::SCALESDespair is appropriate and inevitable.Fri Mar 14 1997 10:0817
By no means am I an expert on this stuff, but I'll offer you some of my
understanding...

The compressed file is basically a description of what was in the original
file and where it was.  Thus, the contents of the original file (i.e., the
exact sets of byte patterns) determine how much compression you get.  A file
with lots of repetition (e.g., lots of zeros) compresses well.  A file with
purely random data may not compress at all.  (In fact, depending on the
compression method, it's possible for the "compressed" file to be larger than
the original, although the compression utility will generally predict or
detect this and "fail" instead of creating a larger output file.)

So, your compression results vary from day to day because the format or
contents of your data file is varying from day to day.


				Webb
9184.2Still confused...MELEE::GERACECindy Gerace @297-3884Fri Mar 14 1997 12:1514
    
    Webb,
    
    Thanks for your reply.  What I don't understand is that we added very
    little data to the database on Thursday, yet Thursday night's
    compressed export file was 3gig bigger than Wednesday night's.  Both
    jobs ran in about 8 hours, but the compression rate was much higher on
    Wednesday.  The only difference I can see is that I started Wednesday
    night's job at 8:30pm and Thursday night's at 10:30pm.  I think we have
    disk backups running after 11pm - could they affect the compression
    rate?
    
    - Cindy
    
9184.3Some ideas to stimulate ideas - maybeUNIFIX::HARRISJuggling has its ups and downsFri Mar 14 1997 16:2124
    Why not supply some additional information, such as the actually
    commands used to do this export/compression.  Then at least someone
    might have a clue as to the compression algrithm being used.  
    
    Do you know if the Oracle export function writes sparse data, or every
    byte (real or imagined) in the file used by the database.o
    
    Does anyone know if Oracle builds cached lookups in the database file
    itself such that lots of lookups from different perspectives might
    create lots of file changes that are not typically caused by updates,
    but might result in data being written to backup medium.
    
    I'm just guessing, and trying to stimulate ideas.
    
    Me, I don't know a thing about compression except things generally get
    smaller, and even less about databases.
    
    The other thing you could do it try alternate compression engines
    
    	compress
    	gzip
    	gzip -9
    
    					Bob Harris
9184.4The compressed file is a "description" of the original.WTFN::SCALESDespair is appropriate and inevitable.Fri Mar 14 1997 17:0832
The compressed file is a "description" of the original.  That is, if the
contents of the file are easy to describe, then a small/simple description is
all that is required, and the compression will be very good (i.e., the "rate"
will be high).  If the contents of the file are hard to describe, then a large
and complex description is required, and the compression will be relatively poor.

.2> What I don't understand is that we added very little data to the database on 
.2> Thursday, yet Thursday night's compressed export file was 3gig bigger than 
.2> Wednesday night's.  

It doesn't matter how much data you changed or added, it's what you changed it
to and where it ended up in the file.  That is, Thursday's compressed file was
bigger not because you added data to it but because the data that you added
messed up the nice characteristics that Wednesday's file had -- you changed the
_patterns_ of the data in the file (creating more patterns or the equivalent),
so that the "description" of the file had to be more complex resulting in a
larger output.

.2> Both jobs ran in about 8 hours

I believe that the length of the run is related almost entirely to the input
size; it has basically nothing to do with the output size.

.2> I think we have disk backups running after 11pm - could they affect the 
.2> compression rate?

Only if you're talking "bytes/second"...  ;-)  No, the compression "rate" (in
terms of the ratio of the output size to the input size) should be deterministic
and unrelated to system (CPU or I/O) load.


				Webb