انت هنا الان : شبكة جامعة بابل > موقع الكلية > نظام التعليم الالكتروني > مشاهدة المحاضرة
الكلية كلية تكنولوجيا المعلومات
القسم قسم البرامجيات
المرحلة 3
أستاذ المادة أسعد صباح هادي الجبوري
12/27/2011 4:28:44 PM
Definition of Regular Expression After the definition of the string and languages, we are ready to describe regular expressions, the notation we shall use to define the class of languages known as regular sets. Recall that a token is either a single string (such as a punctuation symbol) or one of a collection of string of a certain type (such as an identifier). If we view the set of strings in each token class as a language, we can use the regular-expression notation to describe tokens. In regular expression notation we could write the definition for identifier as:- Identifier= letter (letter?digit)* The vertical bar means "or" that is union, the parentheses are used to group sub expressions, and the star is the closure operator meaning "zero or more instances". What we call the regular expression over alphabet ? are exactly those expressions that can be constructed from the following rules. Each regular expression denotes a language and we gives the rules for construction of the denoted languages along with the regular-expression construction rules. 1- ? Is a regular expression denoting {?}, that is, the language consisting only the empty string. 2- For each a in ?, a is a regular expression denoting {a}, the language with only one string, that string consisting of the single symbol a. 3- If R and S are regular expression denoting language LR and LS , respectively, then:-
i) (R)?(S) is a regular expression denoting LR U LS ii) (R) . (S) is a regular expression denoting LR . LS iii) (R)* is a regular expression denoting L*R We have shown regular expression formed with parentheses whenever possible. In fact, we eliminate them when we can, using the precedence rules that * has highest precedence, then comes ., and ?has lowest precedence. Let us assume that our alphabet ? is {a, b}. The regular expression a denotes {a}, which is different from just the string a. 1- The regular expression a* denotes the closure of the language {a}, that is a*=U{ai} The set of all strings of zero or more a s. The regular expression aa*, which by our precedence rules is parsed a(a)*, denote the strings of one or more a s. We may use a+ for aa*
2- What does the regular expression (a?b)* denote? We see that a?b denotes {a, b}, the language with two string a and b. Thus (a?b)* denote U{a, b}i Which is just the set of all string of a s and b s including the empty string. The regular expression (a*b*)* denote the same set.
3- The expression a?ba* is grouped a ?(b (a)*), and denotes the set of strings consisting of either a single "a" or "b" followed by zero or more a s. 4- The expression aa?ab?ba?bb denotes all strings of length two, so (aa?ab?ba?bb)* denotes all strings of even length. Note that ? is a string of length zero. 5- ??a?b denotes strings of length zero or one. Example: The token discussed in fig (5), can be described by regular expression as follows: Keyword=BEGIN?END?IF?THEN?ELSE Identifier=letter (letter?digit)* Constant=digit* Relops= <?<=?=?< >?>?>= Where letter stands for A?B?…?Z, and digit stands for 0?1?…?9. If two regular expression R and S denote the same language, we write R=S, and say that R and S are equivalent. For example, we previously observed that (a?b)*= (a*b*)*. For any regular expression R, S and T, the following axioms hold:- 1- R?S= S?R (?is commutative) 2- R?(S?T)=(R?S) ?T (?is associative) 3- R (ST) = (RS) T ( . is associative) 4- R(S?T) = RS?RT and (S?T) R= SR?TR ( . distributes over 1) 5- ?R=R?=R (? is the identity for concatenation)
Finite Automata A recognizer for a language L is a program takes as input a string x and answer "yes" if x is a sentence of L on "no" otherwise. Clearly, the part of a lexical analyzer that identifies the presence of a token on the input is a recognizer for the language defining that token. Suppose we have specific a language by a regular expression R, and we are given some string x. We want to know whether x is in the language L denoted by R. One way to attempt this test is to check that x can be decomposed into a sequence of substrings denoted by the primitive sub expressions in R. Suppose R is (a?b)*abb, the set of all strings ending in abb and x is the string aabb. We see that R=R1R2, where R1 = (a?b)* and R2 = abb. We can verify that a is an element of the language denoted by R1 and that abb similarly match R2. In this way, we show that abb is in the language denoted by R.
Non Deterministic Automata A better way to convert a regular expression to a recognizer is to construct a generalized transition diagram from the expression. This diagram is called nondeterministic finite automata. A nondeterministic finite automata recognizing the language (a?b)*abb is shown in fig (7).
The NFA is a labeled directed graph. The nodes are called states. and the labeled edges are called transitions. The NFA looks almost like a transition diagram, but edges can be labeled by ? as well as characters, and the some character cal label two or more transitions out of one state. One state (0 in fig (7)) is distinguished as the start state, and one or more states may be distinguished as accepting states (or final states). State 3 in fig (7) is the final state. The transitions of an NFA can be conveniently represented in tabular form by means of a transition table. The transition table for the NFA of fig (7) is shown in fig (8). In the transition table there is a row for each state and a column for each input symbol. The entry for row 1 and symbol a is the set of possible next state for state 1 on input a State Input symbol a b 0 {0,1} {0} 1 ---- {2} 2 ---- {3}
Fig (8) Transition Table The NFA accepts an input string x if and only if there is a path from the start state to some accepting state, such that labels along that path spell out x. If the input string is aabb, then we can show this sequence of moves:- State Remaining input 0 aabb 0 abb 1 bb 2 b
3
In Fig (9) below we can see an NFA to recognize aa*?bb*. String aaa is accepted by going through states 0, 1, 2, 2, and 2. The labels of these edges are ?, a, a and a, whose concatenation is aaa.
Fig (9) NFA accepting aa*?bb*. ــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــ Deterministic Automata The NFA shown in fig (8) has more than one transition from state 0 on input a, that is, it may go to state 0 or 1. Similarly, the NFA of fig (9) has two transitions on ? from state 0. These situations are the reason why it is hard to simulate an NFA with a computer program. The deterministic finite automata has at most one path from the start state labeled by any string. The finite automaton is deterministic if 1- It has no transitions on input ? 2- For each state s and input symbol a, there is at most one edge labeled a leaving s. Example: in fig (10) below we see a deterministic finite automata (DFA) accepting the language (a?b)*abb, which is the same language as that accepted by the NFA of fig (7)
Fig (10) DFA accepting (a?b)*abb Since there is at most one transition out of any state on any symbol, a DFA is easier to simulate by a program than an NFA. How to Build a Lexical Analyzer Step1 Convert the Grammar into Transition Diagram. Step2 Convert the Regular Expression into Nondeterministic Finite State Automata. Step3 Convert the NFA into DFA. Step4 Minimize Finite State Automata. Step5 Write an efficient program for the minimized finite state automata, called (minimized finite state automata recognizer).
المادة المعروضة اعلاه هي مدخل الى المحاضرة المرفوعة بواسطة استاذ(ة) المادة . وقد تبدو لك غير متكاملة . حيث يضع استاذ المادة في بعض الاحيان فقط الجزء الاول من المحاضرة من اجل الاطلاع على ما ستقوم بتحميله لاحقا . في نظام التعليم الالكتروني نوفر هذه الخدمة لكي نبقيك على اطلاع حول محتوى الملف الذي ستقوم بتحميله .
|