انت هنا الان : شبكة جامعة بابل > موقع الكلية > نظام التعليم الالكتروني > مشاهدة المحاضرة

Data Compression Lec#8-2018

الكلية كلية العلوم للبنات     القسم قسم الحاسبات     المرحلة 4
أستاذ المادة علي كاظم محمد هداب الغرابات       21/05/2018 05:41:46
Dictionary Methods
Statistical compression methods use a statistical model of the data, which is why the quality of compression they achieve depends on how good that model is.
Dictionary-based compression methods do not use a statistical model, nor do they use variable-size codes. Instead they select strings of symbols and encode each string as a token using a dictionary.
String Compression
In general, compression methods based on strings of symbols can be more efficient than methods that compress individual symbols in principle, better compression is possible if the symbols of the alphabet have very different probabilities of occurrence. We use a simple example to show that the probabilities of strings of symbols vary more than the probabilities of the individual symbols constituting the strings.
We start with a 2-symbol alphabet a1 and a2, with probabilities P1 = 0.8 and P2 = 0.2, respectively. The average probability is 0.5, and we can get an idea of the variance (how much the individual probabilities deviate from the average) by calculating the sum of absolute differences (variance )= |0.8 ? 0.5| + |0.2 ? 0.5| = 0.6.
Any variable-size code would assign 1-bit codes to the two symbols, so the average size of the code is one bit per symbol.
We now generate all the strings of two symbols. There are four of them, shown in Table 3.1a,

Together with their probabilities and a set of Huffman codes. The average probability is 0.25, so a sum of absolute differences similar to the one above yields variance = |0.64 ? 0.25| + |0.16 ? 0.25| + |0.16 ? 0.25| + |0.04 ? 0.25| = 0.78
The average size of the Huffman code is
1× 0.64 + 2× 0.16 + 3× 0.16 + 3 × 0.04 = 1.56 bits per string, which is 0.78 (1.56/2=0.78) bits per symbol.
In the next step we similarly create all eight strings of three symbols. They are shown in Table 3.1b,

together with their probabilities and a set of Huffman codes. The average probability is 0.125, so a sum of absolute differences similar to the ones above yields variance=|0.512?0.125|+ 3|0.128? 0.125|+3|0.032? 0.125|+|0.008? 0.125|=0.792
The average size of the Huffman code in this case is
1 × 0.512 + 3 × 3 × 0.128 + 3 × 5 × 0.032 + 5 × 0.008 = 2.184 bits per string, which equals 0.728 (=2.184/3) bits. per symbol
As we keep generating longer and longer strings, the probabilities of the strings differ more and more from their average, and the average code size gets better (Table 3.1c).

LZ77 (Sliding Window)
The principle of this method (which is sometimes referred to as LZ1) [Ziv and Lempel 77] is to use part of the previously-seen input stream as the dictionary. The encoder maintains a window to the input stream and shifts the input in that window from right to left as strings of symbols are being encoded. Thus

المادة المعروضة اعلاه هي مدخل الى المحاضرة المرفوعة بواسطة استاذ(ة) المادة . وقد تبدو لك غير متكاملة . حيث يضع استاذ المادة في بعض الاحيان فقط الجزء الاول من المحاضرة من اجل الاطلاع على ما ستقوم بتحميله لاحقا . في نظام التعليم الالكتروني نوفر هذه الخدمة لكي نبقيك على اطلاع حول محتوى الملف الذي ستقوم بتحميله .