T.R | Title | User | Personal Name | Date | Lines |
---|
9184.1 | Compression depends on what you're compressing. | WTFN::SCALES | Despair is appropriate and inevitable. | Fri Mar 14 1997 10:08 | 17 |
| By no means am I an expert on this stuff, but I'll offer you some of my
understanding...
The compressed file is basically a description of what was in the original
file and where it was. Thus, the contents of the original file (i.e., the
exact sets of byte patterns) determine how much compression you get. A file
with lots of repetition (e.g., lots of zeros) compresses well. A file with
purely random data may not compress at all. (In fact, depending on the
compression method, it's possible for the "compressed" file to be larger than
the original, although the compression utility will generally predict or
detect this and "fail" instead of creating a larger output file.)
So, your compression results vary from day to day because the format or
contents of your data file is varying from day to day.
Webb
|
9184.2 | Still confused... | MELEE::GERACE | Cindy Gerace @297-3884 | Fri Mar 14 1997 12:15 | 14 |
|
Webb,
Thanks for your reply. What I don't understand is that we added very
little data to the database on Thursday, yet Thursday night's
compressed export file was 3gig bigger than Wednesday night's. Both
jobs ran in about 8 hours, but the compression rate was much higher on
Wednesday. The only difference I can see is that I started Wednesday
night's job at 8:30pm and Thursday night's at 10:30pm. I think we have
disk backups running after 11pm - could they affect the compression
rate?
- Cindy
|
9184.3 | Some ideas to stimulate ideas - maybe | UNIFIX::HARRIS | Juggling has its ups and downs | Fri Mar 14 1997 16:21 | 24 |
| Why not supply some additional information, such as the actually
commands used to do this export/compression. Then at least someone
might have a clue as to the compression algrithm being used.
Do you know if the Oracle export function writes sparse data, or every
byte (real or imagined) in the file used by the database.o
Does anyone know if Oracle builds cached lookups in the database file
itself such that lots of lookups from different perspectives might
create lots of file changes that are not typically caused by updates,
but might result in data being written to backup medium.
I'm just guessing, and trying to stimulate ideas.
Me, I don't know a thing about compression except things generally get
smaller, and even less about databases.
The other thing you could do it try alternate compression engines
compress
gzip
gzip -9
Bob Harris
|
9184.4 | The compressed file is a "description" of the original. | WTFN::SCALES | Despair is appropriate and inevitable. | Fri Mar 14 1997 17:08 | 32 |
| The compressed file is a "description" of the original. That is, if the
contents of the file are easy to describe, then a small/simple description is
all that is required, and the compression will be very good (i.e., the "rate"
will be high). If the contents of the file are hard to describe, then a large
and complex description is required, and the compression will be relatively poor.
.2> What I don't understand is that we added very little data to the database on
.2> Thursday, yet Thursday night's compressed export file was 3gig bigger than
.2> Wednesday night's.
It doesn't matter how much data you changed or added, it's what you changed it
to and where it ended up in the file. That is, Thursday's compressed file was
bigger not because you added data to it but because the data that you added
messed up the nice characteristics that Wednesday's file had -- you changed the
_patterns_ of the data in the file (creating more patterns or the equivalent),
so that the "description" of the file had to be more complex resulting in a
larger output.
.2> Both jobs ran in about 8 hours
I believe that the length of the run is related almost entirely to the input
size; it has basically nothing to do with the output size.
.2> I think we have disk backups running after 11pm - could they affect the
.2> compression rate?
Only if you're talking "bytes/second"... ;-) No, the compression "rate" (in
terms of the ratio of the output size to the input size) should be deterministic
and unrelated to system (CPU or I/O) load.
Webb
|