T.R | Title | User | Personal Name | Date | Lines |
---|
387.1 | | CAFEIN::PFAU | You can't get there from here | Tue Jan 20 1987 17:50 | 5 |
| There are various programs on PAR5 which implement Huffman
compression/decompression. Look for SQ, USQ or any file containing
these strings in MSDOS:, CPM:, CC:, TURBO:.
tom_p
|
387.2 | | 60592::MOSS | cogito cogito ergo cogito sum. | Wed Jan 21 1987 03:39 | 14 |
| It could come from Un*x pack. This is another very popular compression
program.
Unfortunately, there are many different compression programs around,
and many use different representations of the compressed data.
This can make recovery difficult if the program used to compress
the original is not known.
Huffman coding uses a representation based on the frequency of
occurence of each symbol, and represents frequent simbols with a
minimal number of bits. The first information in the compressed
file is a table enabling the decompression program to reconstruct
the data from the compressed info.
|
387.3 | table is in the file | PLDVAX::ZARLENGA | Bigger they are, Harder they hit | Thu Jan 22 1987 08:43 | 18 |
| > I understand how the algorithm works (conceptually). In a nutshell,
> the algorithm uses a predefined table to compress the data. Therefore,
> each implementation of the Huffman algorithm uses its own table
> and therefore generates it own compressed data. The decompression
> of that data must have knowledge of the table used.
The table that is used is dumped into the compressed file, usually
right at the beginning. The way to de-compress the data is to read
that table (you must know it's format) and then read the file looking
for bit patterns to selectively expand your data back to its original
state.
To do this you must know the way the table is represented (its
format), and the compression/de-compression algorithm. As mentioned
in a previous note, SQ (squeeze) and UQS (unsqueeze) make use of
Huffman compression, you can scan the sources for the algorithm
if necessary.
-mike
|
387.4 | The table doesn't need to go with the data... | YALI::LASTOVICA | Norm Lastovica | Thu Jan 22 1987 17:38 | 4 |
| I don't think that the compression table (that's what it's called
isn't it??) HAS to be part of the compressed object. Sometimes
a program will put it there, but it is not a requirement of the
Huffman system.
|
387.5 | checking... | SMAUG::FLOWERS | I know what I know, I think. | Fri Jan 23 1987 12:46 | 19 |
| Thanks for the help. The PAR5 reference will come in handy in due
time.
I've just receive a couple traces of huffman compressed data. I'll
be trying to locate a common table in these files soon.
Thanks for the help. It was an 'eye-opener'.
A good huffman compression table will generate on the average 2.5
bits per character. It would seem to me that including the compression
table with each transmission of a compressed file would defeat its
own purpose - just my thoughts...
Let's hope you're right anyway (that the table is included), it'll
make this task much easier.
Thanks,
Dan
|
387.6 | | CLOVAX::COBURN | John Coburn, Cleveland | Fri Jan 23 1987 15:02 | 17 |
| < Note 387.5 by SMAUG::FLOWERS "I know what I know, I think." >
The Huffman implementations that I have seen (used on CPM and MSDOS)
always include the table in the compressed file because each file
generates a unique compression table. This means that compressing
using the Huffman technique can in some cases (Even distribution of
the 255 different codes) cause the 'compressed' file to be larger.
This is not often the case, especially in executables (lot of nulls
etc..).
If you are interested in some FORTRAN code see VMSSWEEP.FOR located
at PAR5""::VAX_TOOLS:
Most of the other programs are written in C.
John
|
387.7 | crazy stuff kids | SMAUG::FLOWERS | I know what I know, I think. | Fri Jan 23 1987 17:24 | 15 |
| Well, I've determined that 'they' don't use the same compression
table each time. In fact, consecutive transmissions of the *same*
file produce different results. I attribute this to time-of-day
stamps that probably get used in the generation of each table.
Conclusions : the table must be included in the transmission and
it is different each time.
Any thoughts as to how I can locate the table in this mess. Does
it have a set format - or does the table also have a variable length
and size.?.?.?
Thanks again,
Dan
|
387.8 | Hum, I'd think that the compression would be constant | YALI::LASTOVICA | Norm Lastovica | Fri Jan 23 1987 19:14 | 8 |
| Generating the compression table on a per file basis is usually
used so that the file will be compressed the most. That is the
compression program will look for the most used character through
the least used and build a table based on this information. I'd
think that the same file being compressed would yield the same results.
The encoding is based (in my experience) on the most compression.
If data encription is wanted as well, then putting the table in
the file doesn't help that much.
|
387.9 | | CAFEIN::PFAU | You can't get there from here | Fri Jan 23 1987 20:29 | 8 |
| Do you really need Huffman encoding? The Limpel-Ziev algorithms
usually do quite well (and quick, too) and aren't that difficult to
understand. This algorithm works 'on the fly'. No need for
intermediate files, no need to pass the table to the decompressor.
Just read from your input stream, compress, and write to your output
stream. Look in PAR5::MSDOS:LZ*.ARC for a bit more information...
tom_p
|
387.10 | unlocking mysteries | SMAUG::FLOWERS | I know what I know, I think. | Sun Jan 25 1987 13:29 | 18 |
| First : Yes I do have to use Huffman. I am receiving Huffman
compressed data from a remote source (see .0).
Second : I may have jumped the gun a bit saying that identical files
are compressed differently. I sent the same file four (4) times
through the remote source and at first glance the
compressed data doesn't even look close. However, with variable
length bit patterns, one bit off at the beginning (such as a different
time-of-day stamp, which I think does get sent) will make the whole
thing look different each time.
I really don't relish having to sit down and analyze the compressed
data in binary looking for patterns.
I keep this note updated....
Dan
|
387.11 | | CLT::GILBERT | eager like a child | Mon Jan 26 1987 00:08 | 3 |
| How many bytes get sent for a null file? For a 256-byte file
containing one (each) of the 256 characters? For a 256-byte
file containing the same character duplicated 256 times?
|
387.12 | some comments | 60592::MOSS | cogito cogito ergo cogito sum. | Mon Jan 26 1987 20:42 | 8 |
| 1) What are the first 3 or 4 bytes in the compressed file?
Most compression programs use a "magic number" to tell that the
file has been compressed. If we know this, we can probably tell
what the compression program was.
2) Is the data sensitive? If not, why not make it readable and
have a 'decompress this' contest.
|
387.13 | If I only had the time... | SMAUG::FLOWERS | I know what I know, I think. | Tue Jan 27 1987 09:23 | 10 |
| Unfortunately this task has taken a back seat to more important
(but related) issues. When I get back into the swing of it - maybe
about 3-4 days - I'll let you know what's happening.
In the meantime, any other thought/comments/suggestions are more
than welcome...
Thanks,
Dan
|
387.14 | tell us more | PLDVAX::ZARLENGA | Bigger they are, Harder they hit | Wed Jan 28 1987 17:24 | 14 |
| The table will be in the beginning. Most likely it will start
within 16 bytes. The format is anyone's guess. If you know the
source (assuming you're not stealing confidential information),
you should be able to acquire (in other words, not have to reverse
engineer) the knowledge to de-compress.
Please explain further what kind of data this is and what the
name of the compressor is.
The time of date stamp may indeed be part of the file header,
in that case, the table should start within 16 bytes of the end
of the file header.
-mike
|
387.15 | origin currently confidential | SMAUG::FLOWERS | I know what I know, I think. | Thu Jan 29 1987 10:57 | 13 |
| We don't have any access to the source (aside from illegal actions,
we can't get it either). All we have to work from is a hex dump
of the file (only part of which is compressed - but we know where
it starts).
We can get unlimited hex dumps of any file we wish to create and
push through the compressor (except a null file - it won't allow
that).
I plan on starting to look at it seriously starting today and/or
tomorrow.
|
387.16 | | ANGORA::ZARLENGA | Bigger they are, Harder they hit | Thu Jan 29 1987 14:30 | 13 |
| Can you disassemble the compressor program? Depending upon
your abilities, it may not be too difficult to read through the
assembler code to figure out how the program works. I did this
once and can tell you it will not be as hard as you expect as
long as you're familiar (make that very familiar) with the op
sys and assembler.
Another possibility is feeding the program known files
which should produce known results. Knowing, for example,
that the file contained on A's might make spotting and
deciphering the table easier.
-mike
|
387.17 | we're trying | SMAUG::FLOWERS | I know what I know, I think. | Thu Jan 29 1987 14:59 | 11 |
| Yes, the source could be deassembled. We had thought of this
originally. However, for two reasons it is not an option. 1) The
source is copyright protected and deassembling is illegal, and 2)
the person in charge of the remote system wouldn't let us near it
to try that.
We *are* sending known patterns, and are trying to decipher and
recognize patterns that way. But damn is that a long process.
Dan
|
387.18 | May be usefull | BARNA::SOLEPONT | Jaume, �Barcelona 1992� more than ever | Thu Jan 29 1987 15:34 | 26 |
| Try feeding the program with a file containing this record:
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXYYYYYYYYYYYYYYYYZZZZZZZZ
(32*X, 16*Y, 8*Z, Cr, Lf)
I think this should compress to something like this:
[Header] [Tables] 00000000 AAAAAAAA DB6DB6 10
or
[Header] [Tables] FFFFFFFF 55555555 249249 EF
or similar repeating paterns as long as
X translates to 1 bit
Y translates to 2 bits
Z translates to 3 bits
Cr and Lf to 4 bits each one
Repeat the operation for:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBBBBBBBBBBBBBCCCCCCCC
and compare [Tables].
Regards, *Jaume
|
387.19 | Is this the right problem? | PASTIS::MONAHAN | | Thu Jan 29 1987 16:55 | 29 |
| This sounds like a most amusing technical problem, but as a practical
problem something is wrong.
1) Huffman coding is public. That is what the VMS supplied compression
routines do. They do not violate anyone's copyright. It is legal
to read copyright code, just as it is legal to read a copyright
book.
2) Reverse engineering is legal in most countries. Why not crack
their code to find out what they are doing, and then put a minor
patch in the VMS routines. It can't be much of a change since they
are both Huffman algorithms.
3) If there is really something copyright then you may be wasting
your time anyway. Apple got a judgement (I heard) against someone
who made something that "looked like" a Mac, so your output had
better not look like the output from their decoder????
4) It has to be in *someone's* interests other than DEC to crack
this code. We have enough data of our own without worrying about
anyone else's. :-( Can they not do better than play games with
you by supplying a black box that you are expected to solve?
In summary, I am not well acquainted with U.S. law, but it sounds
to me as if someone is being deliberately awkward, and maybe you
need legal advice rather than technical.
Dave
|
387.20 | Let's have a crack at the code | YALI::LASTOVICA | Norm Lastovica | Thu Jan 29 1987 19:17 | 15 |
| It appears (from .0) that you've got a file that is encoded
(compressed) such that you can not read it. If the sender wishes
you to read it, they'd supply a decription program/technique. A
Huffman encoding scheme will change with each encription since the
count of bits per character is based on the frequency of occurance
of the character in the file. Thus, the encription code will change.
If the program that does the compression is not available to you
(and this sounds true) and if they (the suppliers of the program)
are unwilling to let you decode the data, then you are stuck.
However, in the true spirit of hacking, if you can encript a number
(dozen) files of known input and then DUMP the resulted files, perhaps
a good decriptor (there is a word for this?) or two among us could
help. I do get excited about these challenges. How 'bout it?
|
387.21 | | PASTIS::MONAHAN | | Fri Jan 30 1987 03:40 | 8 |
| We already have perfectly good Huffman decompression routines.
Why not use those, and get the sender of data to match. He can use
our routines if he has a VAX, or patch his own routines to match
the minor details of the algorithm if he has not.
In any case, I doubt it is much more difficult to code the
compression than the decompression. Maybe you are trying to implement
the wrong end of the link.
|
387.22 | | SMAUG::FLOWERS | I know what I know, I think. | Fri Jan 30 1987 11:01 | 10 |
| We *are* seeking legal advice. Until I get the word from them, I
can't give anymore details about the remote source than what I've
already given. Suffice it to say that the remote source would rather
we didn't figure out how it built it compression table(s). We wish
to communicate with them (not them with us) and therefore we have
to understand what/how they're saying.
Regards,
Dan
|
387.23 | Too much mistery | BARNA::SOLEPONT | Jaume, �Barcelona 1992� more than ever | Fri Jan 30 1987 12:12 | 7 |
| > I can't give anymore details about the remote source than what I've
> already given. We wish to communicate with them (not them with us)
> and therefore we have to understand what/how they're saying.
It seems like CLOSE ENCOUNTERS OF THE *FOURTH* KIND! ;-)
*Jaume
|
387.24 | | MKTUP1::EIBEN | | Fri Jan 30 1987 13:07 | 41 |
| Aside from the questionable 'mystery' - as long as You know enough
about the internals of the 'other' system - You might be able to
to 'reverse-engineer'...
Here the 'mights':
HUFFMAN encoding IS NOT BOUND to ANY BYTE-size -- infact started
using it about 10 years ago on TOPS [and used 6,7 and 8-bit bytes,
depending on data-structure [6 for COBOL and executables,7 for text
and 8 for micro-executables].
The translation-table is 'fixed-size' in that it has to hold all
occuring members of the SET in the compressed file -- so it can
be relatively short for a file comprised of only same characters.
Depending on implementation original specs of FILENAME etc are
typically first piece of compressed file [plus VERSION-NR of
compressor] - that probably 'explains' that same file has slightly
different contents in beginning .
LZW techniques don't need to send 'tables' since COMPRESSOR and
DE-Compressor follow the same 'SET OF RULES in building the
compression-table'. Here again typically a Compression-Version
Nr is included - followed by a file-spec. There are currently
[aside from the more-or-less standard UNIX-COMPRESS] atleast 3
more 'enhancements' to LZW in use.
Both Compression techniques have been mentioned as 'encryption'
techniques - i.e. IF YOU MISS a SINGLE BIT - You'll be even with
FULL KNOWLEDGE completely OFF SYNCH till You happen to see a
'out-of-band' RESET {LZW} or get another file {Huffman}. Since
one of the current LZW implementations doesn't use a straight
RESET but only an 'adaptive' RESET it'll be even more 'crytic'-
although also more compressing.
Huffman {old technology - SLOW and CUMBERSOME} compresses typically
text by 40% and executables by 15%. LZW crunches text by 50% and
executables by 20% - and [best of all] crunches Pixel-info by about
90%.
Rgds,
Bernie.
|
387.25 | | PASTIS::MONAHAN | | Fri Jan 30 1987 16:26 | 45 |
| "the remote source would rather we didn't figure out how it
built it compression table(s)"
If they don't want you to figure this out, then they had better
well supply them. If ever you work out how to decompress *without*
being given the tables you will be most of the way to knowing how
the tables were built. You will not be able to decompress without
at least a functional equivalent of their tables, and if you work
it out for yourself then you will be in a much better position to
understand how they were built than if you just took them and used
them.
Good compression looks just like a random bit stream if you
do not know the decompression algorithm (otherwise it could be
compressed more). Please admit that this is a bet that you can't
crack it within a month, and then we can forget the politics and
enjoy the technicalities.
Try taking a large file, and running it through the VMS compression
utility. You will probably reduce it by quite a bit. Then try reducing
the compressed file. That will probably make it a little smaller,
but after about 3 or 4 tries it will start getting larger. This
indicates that the compression algorithm can discover no regularities,
that is, it essentially random bits to the limits of its analysis.
If I was your customer, and *really* wanted you not to discover
the compression algorithm, then I would take whatever you supplied
and return a random bit stream about half the length. That would
keep you guessing for months.
If the remote end is as unhelpful as you imply, then my guess
is that you are trying a "known plaintext" attack against some form
of encryption that you suspect might be Huffman encoding. If this
second guess is nearer than the ten dollar bet guess, then I would
suggest you get all of this note removed from such a widely available
forum.
Incidentally, the previous suggestions for patterns of characters
to try are good. Other things you should try are sending the same
file multiple times until you understand where the time or sequence
number affects things. Then try the same file but with one bit changed,
and see how far that bit change affects things. How sure are that
the encoding is Huffman?
Dave
|
387.26 | 1-gram, 2-gram ,3 | TAV02::NITSAN | Duvdevani, DEC Israel | Thu Feb 12 1987 05:30 | 14 |
| Re. -1,-2:
The simple way to use Huffman code is to compress on a single character
distribution basis. For example: "E" is much more common than "Z", therefore
"E" will be represented by less bits than "Z". This is why after the
compression you get those "random" bits. HOWEVER, if this "simple" method
was chosen, NOT all the redundancy has been moved. There are PAIRS or
even larger sequences of letters that are more common (for example: "TH"
is much much more common than "CB" and "THE" is much more common than "QQQ").
That ofcourse depends on the source language (maybe a Fortran program source
is compressed...). For very large compressed data, with knowledge of the
source language behavior, this may give some clue...
Nitsan.
|
387.27 | A new kid hitting the streets ... on his head. | SMAUG::MENDEL | | Wed Feb 25 1987 16:32 | 81 |
|
The Huffman code of the .0 mention has been 99% cracked.
(I am a co-worker of Mr. Flowers who has taken on the project.)
It was accoplished by sending a text containing all 256 characters,
each character seperated by another, known, character. I.e. /A/B/C/...
Then the output of the 'compressor' was expanded into binary form
and analyzed. If this sounds tedious, it was. After an hour of
staring at 1's and 0's, they start moving on the paper...
The results were then verified against another known text. Bingo!
Some things I found out are:
1) There is no table in the beginning of the compressed text.
The compressor uses the same table every time, and expects the
receiver to have it also.
This leads me to beleive that the intent was not so much
towards compression as towards encryption.
2) There are "dead ends" (more than one) in the tree.
Huffman codes are generally
represented by a binary tree, each node being a leaf, containing
a character, or having exactly two branches. The farther down the tree,
the less common the character. The bit string of the coded
character is the path from the root of the tree to the leaf the
character is on: 0 = take left branch, 1 = take right branch.
Anyways, after decoding I found that there were nodes
with only one branch being used, the other a "dead end". A bit sequence
beginning with a path to the dead end is, thus, un-decodeable.
Could these dead ends not be dead ends at all, but be paths
to leafs containing data other than characters? For example,
multiple character sequences (as mentioned some reply earlier)
or control information?
Could they be there as sanity checks? If you hit one
unexpectedly while decoding, you know you have made a mistake and
dont have to finish the document. Normally, in Huffman decoding,
if you make a mistake, you dont know about it: you just decode
erroneously until the end. This seems plausable, too.
The tree-depth/bit-length of the dead ends was 5, up there in
frequency with the letters "E", "1", and "9".
3) This is the worst part. After the code for each byte of the
text, there is MORE compressed data. I cannot as yet decode
it. It begins with a bit sequence that (my luck!) begins with
a dead end sequence! Which means I cannot decode this trailer,
because I don't know if the dead end hides only a leaf, or an
large subtree, and thus I don't know the end of the bit sequence.
I've had all kinds of ideas, none of which seem to work.
Is it just a terminator? Its much too long for that - the equivelent
of about 8 coded characters. Further, the end is otherwise demarked
in the endcoded text by means of a field giving the length of
the coded data down to the byte. This trailing data is INSIDE
this length.
Is it a checksum of some kind? Byte sum? Bit sum? CRC (God! No!)
By testing, I've found that the trailing bits can very greatly
in content and in length. It doesn't seem to be consistant with
anything. Different texts sometimes end almost the same.
Similar texts sometimes end differently.
One thing I'm certain (hackers intuition) it isn't is data
external to the text (as in file name or something. )
******
Well, I'm hard pressed to figure out just what the trailing data
at the end of the Huffman coded text is.
So I submit it to the rest of the hacking world ...
Write back!
Kevin Mendel
|
387.28 | is it meaningful data? | 37935::ZARLENGA | Bigger they are, Harder they hit | Thu Feb 26 1987 16:05 | 17 |
| Possibly it's random data thrown in at "encode" time and
thrown out at "decode" time?
Either uninitialized memory chunks or time-varying data
such as time-of-date would qualify?
If you run the same file (with the EXACT same name) twice,
using the EXACT same command and you get different results
as output, then the "extra" data is time-varying. That means
the kind of info I listed above.
I once did this exact same thing a few years ago. At encode
time I added a sequence to the begining of each record that told
me the file had been encoded (so the next time it would be decoded)
length info, and selected bits of the time of day. This program
was limited in use since we could not encode .EXEs because the
original might contain the "encode sequence" and the file would
then be decoded instead of encoded.
-mike z
|
387.29 | | MKTUP1::EIBEN | | Tue Mar 03 1987 12:23 | 25 |
| [Only for the record - since I stated earlier already, that neither
HUFFMAN without 'dictionary' nor LZW [-that one even WITH getting
everything - but not knowing the 'startup-state'] are 'encodable'
in the sense - that You would be able to 'port' rules You found
out before to another 'unknown' file.]
Early implementations of HUFFMAN encoding [be aware HUFFMAN only
described the 're-ordering' of byte-encoding - he never gave an
implementation example - BTW - same as LEMPEL-ZIPF-WELCH ] were
working with "unbalanced tree's" [just the stuff You desribe]-
"Leaf balancing" was added in/about 1983.
Pre-sending the 'dictionary' is only necessary if one expects byte-
accumulations with heftily changing composition [i.e. software
sources and binaries - which vary from language to language] - its
quite effective to use an only slightly changing dictionary [by
time and/or by file-type] - same as LZW, which assumes an 'empty'
translation table - but could in case of 'text-only' start with
the alphabet ordered in decreasing usage pattern - or anything else
for that matter.
Good luck in the learning experience
Rgds,
Bernie.
|
387.30 | | SMAUG::MENDEL | | Fri Mar 06 1987 09:42 | 22 |
| I tried very hard not to say "balancing" in .-3. If the Huffman
code is ideal for the data, the more unbalanced the tree is the
more the data will be compressed.
I meant something else.
According to textbook stuff, the way a tree is created is to do
this: Start with a list of characters and their frequencies. Take
the two least common and combine them into one node, as left and
right sons. The parent node is given the frequency of the sum of
the sons. Add the parent to the list, remove the two sons from the
list. Then find the two elements that are least common .... ad
infinitum. In the end, there will be one left, and you will have
your tree.
The point is, by this process, every non-leaf node has exactly two
sons. So how do nodes with one son get involved? was my question.
Do they have a use? Or, because I built the tree from
reverse-engineering the code of a known text, have I just not found
out what the other son contains?
Clearer?
|
387.31 | | VIDEO::LEICHTERJ | Jerry Leichter | Thu Mar 12 1987 23:10 | 72 |
| Several previous notes contain (at least) two errors in fact about Huffman
encoding that need correction:
1. "Huffman compression needs either a fixed table, or a table
included in the text sent". This is true of the original
Huffman algorithm, but one can do better with adaptive
Huffman compression. In this technique, the two ends start
off with the same table, but collect statistics about
the characters that actually occur. As the more is learned
about the file, the table is changed.
If you think about an implementation in terms of trees, it's
easy to see how this works. Start off with a balanced full
binary tree (equivalent to assuming that all characters are
equally likely). In each node, store a count; initially all
are the same. When a character has been sent/received, update
its count. Make sure the tree is still a "proper Huffman
tree". If not, re-order as necessary before processing
the next character.
The original Huffman algorithm is provably optimal among
all possible compression algorithms, subject to appropriate
constraints on how "large" the units of redundancy are.
Adaptive Huffman compression is NOT optimal, but converges
to optimal given a "regular enough" input.
"Optimal" here counts only the number of bits sent. In the non-
adaptive case, the table is NOT counted - it's assumed that
the frequency distribution is known at both ends. If your
measure of "optimal" includes some notion of efficient implemen-
tability on typical hardware, the variable-length codes of
Huffman compression cost you; LZW, which uses fixed-length
codes, but deals with a varying-width compression window, does
better.
2. It is NOT true that losing one bit from a Huffman-compressed message
screws up the rest. In fact, such a message has an interesting
redundancy property: If you simply continue "reading", you
are highly likely to fall back into proper sync after a couple
of codes. To see why, note that if you EVER, in reading, find
yourself "back in sync", you will STAY in sync, since the
decoding process has no memory. (The reason you can get away
with this is that no code can be a leading substring of another
code - think of the tree - only leaves are code words.)
Now suppose the distribution is "uneven", so that the Huffman
compression really helps. Then there is at least one "small"
code. It is small exactly because the symbol it codes for
occurs often; so the code occurs often. Every time the reader
reaches this small code, it is offset from the beginning of
the code by at most the number of bits in the code. So, if
the code is k bits long, there is a 1/k chance of being
aligned.
Note that this argument works just as well for errors that
lose a bit, insert a bit, or simply flip a bit. The more
skewed the original symbol distribution, the faster conver-
gence is likely to occur. In the worst case, equal proba-
bilties and 2^k symbols, all the codes will be k bits long.
A single bit flipped will screw up only that code, but the
reader will not be able to recover from the insertion or loss
of a bit.
Of course, with the adaptive Huffman algorithm - or, in gene-
ral, ANY adaptive algorithm, screwing up a symbol may cause
the sender and receiver to update their tables differently,
thus leading to yet more errors. For adaptive Huffman as
described above, however, it may be that the tables will
eventually re-converge, assuming well-enough behaved message
distributions.
-- Jerry
|