[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:	DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:	Welcome to the Digital UNIX Conference
Moderator:	SMURF::DENHAM

Created:	Thu Mar 16 1995
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	10068
Total number of notes:	35879

9184.0. "Compression using pipe - what causes variations?" by MELEE::GERACE (Cindy Gerace @297-3884) Fri Mar 14 1997 09:26

    We are running V3.2c on an Alpha 8400 (huge machine - 8 cpus, 8gig of
    memory).  We have an Oracle database and export the data to a file
    using a pipe to compress it.  Yesterday the export file was 9gig, today
    it is 12gig.  The compression rate yesterday was 39%, today it was 20%. 
    Does anyone know what would cause the compression rate to vary from day
    to day?  We've had it as high as 42% and as low as 5%.  Since it really
    affects the size of the export file, we'd like to get as much
    compression as possible.
    
    Thanks,
    
    - Cindy

T.R	Title	User	Personal Name	Date	Lines
9184.1	Compression depends on what you're compressing.	WTFN::SCALES	Despair is appropriate and inevitable.	`Fri Mar 14 1997 10:08`	17
	By no means am I an expert on this stuff, but I'll offer you some of my understanding... The compressed file is basically a description of what was in the original file and where it was. Thus, the contents of the original file (i.e., the exact sets of byte patterns) determine how much compression you get. A file with lots of repetition (e.g., lots of zeros) compresses well. A file with purely random data may not compress at all. (In fact, depending on the compression method, it's possible for the "compressed" file to be larger than the original, although the compression utility will generally predict or detect this and "fail" instead of creating a larger output file.) So, your compression results vary from day to day because the format or contents of your data file is varying from day to day. Webb
9184.2	Still confused...	MELEE::GERACE	Cindy Gerace @297-3884	`Fri Mar 14 1997 12:15`	14
	Webb, Thanks for your reply. What I don't understand is that we added very little data to the database on Thursday, yet Thursday night's compressed export file was 3gig bigger than Wednesday night's. Both jobs ran in about 8 hours, but the compression rate was much higher on Wednesday. The only difference I can see is that I started Wednesday night's job at 8:30pm and Thursday night's at 10:30pm. I think we have disk backups running after 11pm - could they affect the compression rate? - Cindy
9184.3	Some ideas to stimulate ideas - maybe	UNIFIX::HARRIS	Juggling has its ups and downs	`Fri Mar 14 1997 16:21`	24
	Why not supply some additional information, such as the actually commands used to do this export/compression. Then at least someone might have a clue as to the compression algrithm being used. Do you know if the Oracle export function writes sparse data, or every byte (real or imagined) in the file used by the database.o Does anyone know if Oracle builds cached lookups in the database file itself such that lots of lookups from different perspectives might create lots of file changes that are not typically caused by updates, but might result in data being written to backup medium. I'm just guessing, and trying to stimulate ideas. Me, I don't know a thing about compression except things generally get smaller, and even less about databases. The other thing you could do it try alternate compression engines compress gzip gzip -9 Bob Harris
9184.4	The compressed file is a "description" of the original.	WTFN::SCALES	Despair is appropriate and inevitable.	`Fri Mar 14 1997 17:08`	32
	The compressed file is a "description" of the original. That is, if the contents of the file are easy to describe, then a small/simple description is all that is required, and the compression will be very good (i.e., the "rate" will be high). If the contents of the file are hard to describe, then a large and complex description is required, and the compression will be relatively poor. .2> What I don't understand is that we added very little data to the database on .2> Thursday, yet Thursday night's compressed export file was 3gig bigger than .2> Wednesday night's. It doesn't matter how much data you changed or added, it's what you changed it to and where it ended up in the file. That is, Thursday's compressed file was bigger not because you added data to it but because the data that you added messed up the nice characteristics that Wednesday's file had -- you changed the _patterns_ of the data in the file (creating more patterns or the equivalent), so that the "description" of the file had to be more complex resulting in a larger output. .2> Both jobs ran in about 8 hours I believe that the length of the run is related almost entirely to the input size; it has basically nothing to do with the output size. .2> I think we have disk backups running after 11pm - could they affect the .2> compression rate? Only if you're talking "bytes/second"... ;-) No, the compression "rate" (in terms of the ratio of the output size to the input size) should be deterministic and unrelated to system (CPU or I/O) load. Webb