انت هنا الان : شبكة جامعة بابل > موقع الكلية > نظام التعليم الالكتروني > مشاهدة المحاضرة

data compression unary code

الكلية كلية العلوم للبنات     القسم قسم الحاسبات     المرحلة 4
أستاذ المادة محمد عبيد مهدي الجبوري       3/18/2011 10:15:51 PM

Basic Concepts in Data Compression

 

          In this section the basic concepts of data compression are shown:-

 

 

 

 The Unary Code

 

The unary code of the non-negative integer n is defined as n-1 ones followed by one zero or, alternatively, as n-1 zeros followed by a single one.

 

 

Table : Some Unary Codes

n

 

Code

 

Alt. Code

 

1

 

0

 

1

 

2

 

10

 

01

 

3

 

110

 

001

 

4

 

1110

 

0001

 

5

 

11110

 

00001

 

 

 

 Entropy Coding

 

          We can define the entropy of a signal symbol ai as –Pi log2Pi. This is the smallest number of bits needed, on the average, to represent the symbol.

 

The amount of information contained in one, base-n symbol is:

 

 

 

 


This quantity is called the entropy of the data being transited.

 

The entropy of the data depends on the individual probabilities Pi, and is smallest when all n probabilities are equal.

 

Data Compression Strategies

 

            There are different ways that data compression techniques can be categorized, smith  gives a compression classification as below:

 

 

a. Lossless or Lossy

 

   Lossless

 

    Lossy

 

RLE

 

JPEG

 

Huffman

 

MPEG

 

Arithmetic

 

Vector quantization

 

Quadtree

 

 

 

b. Fixed or variable group size

 

Method

 

Group Size

 

Input

 

Output

 

Huffman

 

Fixed

 

Variable

 

Arithmetic

 

Variable

 

Variable

 

RLE, LZW

 

Variable

 

Fixed

 

 

Most data compression programs operate by taking a group of data from the original file and compressed it in some way, and then writing the compressed group to the output file.

 

 

1.1.2 Irreversible Text Compression

 

          This method used to compress text by throwing away some information .the decompressed text will not be identical to the original, so such methods are not general purpose, they can be used in special cases.

 

Ex1//A run of consecutive blank spaces may be replaced by a single space.

 

 

Ex2// In extreme cases all text characters except letters and spaces may be thrown away, and the letters may be case flattened (converted to all lower- or all uppercase). This will leave just 27 symbols, so a symbol can be encoded in 5 instead of the usual 8 bits.

 

The compression ratio is 5/8 = .625, not bad, but the loss may normally be too great.

 

 Ad Hoc Text Compression

 

If the text contains many spaces but they are not clustered, they may be removed and their positions indicated by a bit-string that contains a 0 for each text character that is not a space and a 1 for each space. Thus, the text Here are some ideas, is encoded as the bit-string “0000100010000100000” followed by the text

 

Herearesomeideas.

 

If the number of blank spaces is small, the bit-string will be sparse.

 

 

 Run-Length Encoding

 

The idea behind this approach to data compression is this: If a data item d occurs n consecutive times in the input stream, replace the n occurrences with the single pair nd. The n consecutive occurrences of a data item are called a run length of n, and this approach to data compression is called run length encoding or RLE. We apply this idea first to text compression then to image compression.

 

 

a) RLE Text Compression

 

 

Just replacing “2._all_is_too_well” with “2._a2_is_t2_we2” will not work. Clearly, the decompressor should have a way to tell that the first “2” is part of the text while the others are repetition factors for the letters “o” and “l”. Even the string “2._a2l_is t2o_we2l” does not solve this problem (and also does not provide any compression). One way to solve this problem is to precede each repetition with a special escape character.

 

If we use the character “@” as the escape character, then the string “2._a@2l_is_t@2o we@2l” can be decompressed unambiguously. However, it is longer than the original string, since it replaces two consecutive letters with three characters. We have to adopt the convention that only three or more repetitions of the same character will be replaced with a repetition factor. Figure 1.3a is a flowchart for such a simple run-length text compressor.

 

After reading the first character, the count is 1 and the character is saved. Subsequent characters are compared with the one already saved and, if they are identical to it, the repeat-count is incremented. When a different character is read, the operation depends on the value of the repeat count. If it is small, the saved character is written on the compressed file and the newly read character is saved. Otherwise, an “@” is written, followed by the repeat-count and the saved character.

 

Decompression is also straightforward. It is shown in Figure 1.3b. When an “@” is read, the repetition count n and the actual character are immediately read, and the character is written n times on the output stream.

 

 

 


المادة المعروضة اعلاه هي مدخل الى المحاضرة المرفوعة بواسطة استاذ(ة) المادة . وقد تبدو لك غير متكاملة . حيث يضع استاذ المادة في بعض الاحيان فقط الجزء الاول من المحاضرة من اجل الاطلاع على ما ستقوم بتحميله لاحقا . في نظام التعليم الالكتروني نوفر هذه الخدمة لكي نبقيك على اطلاع حول محتوى الملف الذي ستقوم بتحميله .