انت هنا الان : شبكة جامعة بابل > موقع الكلية > نظام التعليم الالكتروني > مشاهدة المحاضرة

Hash Table

الكلية كلية تكنولوجيا المعلومات     القسم قسم البرامجيات     المرحلة 3
أستاذ المادة أسعد صباح هادي الجبوري       12/27/2011 7:44:35 PM
Hash Table
The central idea of a hash table is to map each of a large space of possible names that might be entered into a symbol table to one of a fixed number of positions in a hash-table. This mapping is done by a hash function.
A hash function is normally assumed to have the following properties:-
1- h(n) depends on n.
2- h can be computed quickly
3- h is uniform and randomizing in mapping names to hash addresses. That is all hash addresses are mapped with equal probability, and similar names don’t cluster to the same hash addresses.
Some hash functions treat a name as sequence of words, with some number of characters per word. Names longer than one word are folded together into one word, usually by exclusive or operations or by multiplying together two n bit words and keeping the middle n bits of the product. The hash value is then obtained by taking the remainder modulo m, where the hash table has m entries. Note that if m is equal to 2b, this division simply isolates the rightmost b bits. Thus, such table sizes should be avoided.
An alternative is to compute a hash value character by character, as a token is scanned. Simple hash function include (c1 + c2 +… + cn) mod m or (c1* c2 * c3 * … *cn) mod m, where the token is composed of characters c1, c2, …, cn , though care must be taken to avoid or handle overflows in doing such computations.
?
The fig (32) below show a hash table.









Fig (32) shows two tables, a hash table and a storage table. The hash table consist of k words, numbered 0, 1 … k-1. These words are pointers into the storage table to the heads of k separate linked lists (some lists may be empty). Each record in the symbol table appears on exactly one of these lists
To determine whether NAME is in the symbol table, we apply to NAME a hash function h such that h(NAME) is an integer between 0 and k-1. It is on the list numbered h(NAME) that the record for NAME belongs. To inquire about NAME, we compute h(NAME) and search that list only. To enter NAME into the symbol table, we create a record for it at the first available place in the storage table and link that record to the beginning of the h(NAME) th list. Since the average list is n/k records long if there are n names in the table, we have cut our searching work down by a factor of k. As k can as large as we like, we can choose k sufficiently large that n/k will be small for even very large programs.
Resolving Collisions
Because the number of the possible names that can be entered into a symbol table is usually mush larger than the number of hash addresses, collisions can occur. That is for names n1 and n2 (n1?n2) but h(n1) =h(n2), when such a collisions occurs, a number of collisions-handling techniques are possible :-
1- Liner Resolution
If position h(n) is occupied, try (h(n)+1) mod m , (h(n)+2) mod m , and so on. If any table positions are free, they will be found eventually. The main problem with this technique is that as the table fills, long chains tend to form.
2- Add-the-Hash Rehash
If h(n) is occupied, try (2*h(n)) mod m, (3*h(n)) mod m and so on. This helps prevent long chains, but m must be prime if all hash positions are to be eventually tried.
3- Quadratic Rehash
If h(n) is occupied, try (h(n)+1*2) mod m, (h(n)+2*2) mod m, and so on.
4- Collision Resolution by Chaining
Names are not placed in the hash table at all, but rather records for all names that hash to a given value, are chained together on a linked list. Only list headers are stored in the hash table itself as shown in fig (33).





0


.
.
.
k-1









Symbol Value Hash Code
Ali 14025 0
Amir 45012 5
Emad 34004 2
Sami 45019 2
Samir 45009 7
Zohair 15015 0
Talib 25014 1
farid 25014 3
Wahab 14004 4
Saad 24005 6

0 Ali 14025
1 Talib 25014
2 Emad 34003
3 Farid 25014
4 Wahab 14004
5 Amir 45012
6 Saad 24005
7 Samir 45009










(a) Symbols, values & hash code derived from symbols.
(b) Eight entry hash table with linked lists of symbols & values.
_____________________________
Syntax Analyzer (Parser)
We show that the lexical structure of tokens could be specified by regular expressions and that form a regular expressions we could automatically construct a lexical analyzer to recognize the tokens denoted by the expression. We shall use a notation called a context-free grammar or grammar only, for the syntactic specification of a programming language which is also called a BNF (Backus-Naur Form) description.
The parser obtains a string of tokens from the lexical analyzer, and verifies that the string can be generated by the grammar for the source language. The parser report any syntax errors and give an indication of the type of this errors. Fig (35) shows the position of parser.








Fig (35) position of parser in compiler model.
?
Context Free Grammars (CFG s)
It is natural to define certain programming-language construct recursively. For example, we might state:-
If S1 and S2 are statements and E is an expression, then:-
"If E then S1 else S2" is a statement ------------- (1)
Or: If S1, S2,…, Sn are statements then
"begin S1; S2 ;…;Sn end" is a statement --------------(2)
As a third example:
If E1 and E2 are expressions, then "E1+E2" is an expression ----------- (3)
If we use the syntactic category "statement" to denote the class of the statements and "expression" to denote the class of expressions, then (1) can be expressed by this production:
Statement if expression then statement else statement ------- (4)
And (3) can be written as :-
Expression expression + expression ----------- (5)
If we want to write (2) in the same way we may have :
Statement begin statement; statement; … ; statement end
But the use of ellipses (…) would create problems when we attempt to define translations based on this description. For this reason, we require that each rewriting rule (production) have a known number of symbols, with no ellipses permitted.

To express (2) by rewriting rules, we can introduce a new syntactic category "statement-list" denoting any sequence of statements separated by semicolons. Then we can write:-

Statement begin statement-list end
Statement-list statement | statement; statement-list ----------- (6)
A set of rules such that (6) is an example of a grammar. In general, a grammar involves four quantities: - start symbol, terminals, non-terminals, productions (rewriting rules).
EX: consider the following grammar for simple arithmetic expressions. The non-terminal symbols are expressions and operator, with expression the start symbol. The terminal symbols are: id, +, -, *, /, (, ), ?
The productions are
Expression Expression operator Expression
Expression (Expression)
Expression - Expression
Expression id
Operator +
Operator -
Operator *
Operator /
Operator ?
We can write these productions in this form :-
E EAE| (E) | -E| id
A + | - | / | ?| *



المادة المعروضة اعلاه هي مدخل الى المحاضرة المرفوعة بواسطة استاذ(ة) المادة . وقد تبدو لك غير متكاملة . حيث يضع استاذ المادة في بعض الاحيان فقط الجزء الاول من المحاضرة من اجل الاطلاع على ما ستقوم بتحميله لاحقا . في نظام التعليم الالكتروني نوفر هذه الخدمة لكي نبقيك على اطلاع حول محتوى الملف الذي ستقوم بتحميله .