انت هنا الان : شبكة جامعة بابل > موقع الكلية > نظام التعليم الالكتروني > مشاهدة المحاضرة

oring code

الكلية كلية العلوم للبنات     القسم قسم الحاسبات     المرحلة 4
أستاذ المادة محمد عبيد مهدي الجبوري       3/11/2012 5:45:31 PM
Sparse Strings
Regardless of what the input data represents—text, binary, images, or anything else—we
can think of the input stream as a string of bits. If most of the bits are zeros, the string
is sparse. Sparse strings can be compressed very efficiently, and this section describes
methods developed specifically for this task. Before getting to the individual methods
it may be useful to convince the reader that sparse strings are not a theoretical concept
but do occur commonly in practice. Here are some examples.
1. A drawing. Imagine a drawing, technical or artistic, done with a black pen on
white paper. If the drawing is not very complex, most of it remains white. When
such a drawing is scanned and digitized, most of the resulting pixels are white, and the
percentage of black ones is small. The resulting bitmap is an example of a sparse string.
2. A bitmap index for a large data base. Imagine a large data base of text documents.
A bitmap for such a data base is a set of bit-strings (or bitvectors) that makes it easy
to identify all the documents where a given word w appears. To implement a bitmap,
we first have to prepare a list of all the distinct words wj in all the documents. Suppose
that there are W such words. The next step is to go over each document di and prepare
a bit-string Di that is W bits long, containing a 1 in position j if word wj appears in
document di. The bitmap is the set of all those bit-strings. Depending on the database,
such bit-strings may be sparse.
(Indexing large databases is an important operation, since a computerized database
should be easy to search. The traditional method of indexing is to prepare a concordance.
Originally, the word concordance referred to any comprehensive index of the Bible, but
today there are concordances for the collected works of Shakespeare, Wordsworth, and
many others. A computerized concordance is called an inverted file. Such a file includes
one entry for each term in the documents constituting the database. An entry is a list
of pointers to all the occurrences of the term, similar to an index of a book. A pointer
may be the number of a chapter, of a section, a page, a page-line pair, of a sentence, or
even of a word. An inverted file where pointers point to individual words is considered
fine grained. One where they point to, say, a chapter is considered coarse grained. A
fine-grained inverted file may seem preferable, but it must use large pointers, and as a
result, it may turn out to be so large that it may have to be stored in compressed form.)
3. Sparse strings have also been mentioned in Section 4.8.3, in connection with JPEG.
The methods described here (except prefix compression, Section 8.5.5) are due to
[Fraenkel 85]. Section 2.4 discusses the Golomb codes and illustrates their application
to sparse strings.

المادة المعروضة اعلاه هي مدخل الى المحاضرة المرفوعة بواسطة استاذ(ة) المادة . وقد تبدو لك غير متكاملة . حيث يضع استاذ المادة في بعض الاحيان فقط الجزء الاول من المحاضرة من اجل الاطلاع على ما ستقوم بتحميله لاحقا . في نظام التعليم الالكتروني نوفر هذه الخدمة لكي نبقيك على اطلاع حول محتوى الملف الذي ستقوم بتحميله .