انت هنا الان : شبكة جامعة بابل > موقع الكلية > نظام التعليم الالكتروني > مشاهدة المحاضرة
الكلية كلية تكنولوجيا المعلومات
القسم قسم البرامجيات
المرحلة 3
أستاذ المادة أسعد صباح هادي الجبوري
12/27/2011 7:36:03 PM
The Language for Specifying Lexical Analyzer We shall now study how to build a lexical analyzer from a specification of tokens in the form of a list of regular expressions. The discussion centers around the design of an existing tool called LEX, for automatically generating lexical analyzer program. A LEX source program is a specification of a lexical analyzer, consisting of a set of regular expressions together with an action for each regular expression. The action is a piece of code which is to be executed whenever a token specified by the corresponding regular expression is recognized. The output of LEX is a lexical analyzer program constructed from the LEX source specification. Unlike most programming languages, a source program for LEX does not supply all the details of the intended computation. Rather, LEX itself supplies with its output a program that simulates a finite automaton. This program takes a transition table as data. The transition table is that portion of LEX s output that stems directly from LEX s input.The situation is shown in fig (23),where the lexical analyzer L is the transition table plus the program to simulate an arbitrary finite automaton expressed as a transition table. Only L is to be included in the compiler being built.
A LEX source program consists of two parts, a sequence of auxiliary definition followed by a sequence of translation rules. Auxiliary Definitions The auxiliary definitions are statements of the form: D1=R1 D2=R2 Dn=Rn Where each Di is a distinct name, and each Ri is a regular expression whose symbol are chosen from ? ? {D1, D2, Di-1}, i.e., characters or previously defined names. The Di s are shorthand names for regular expressions. ? Is our input symbol alphabet. Example: We can define the class of identifiers for a typical programming language with the sequence of auxiliary definitions. Letter = A?B?…?Z Digit = 0?1?…?9 Identifier = Letter (Letter?Digit)* Translation Rules The translation rules of a LEX program are statements of the form:- P1 {A1} P2 {A2}
Pm {Am} Where each Pi is a regular expression called a pattern, over the alphabet consisting of ? and the auxiliary definition names. The patterns describe the form of the tokens. Each Ai is a program fragment describing what action the lexical analyzer should take when token Pi is found. The Ai s are written in a conventional programming language, rather than any particular language, we use pseudo language. To create the lexical analyzer L, each of the Ai s must be compiled into machine code. The lexical analyzer L created by LEX behaves in the following manner: L read its input, one characters at a time, until it has found the longest prefix of the input which matches one of the regular expressions, Pi. Once L has found that prefix, L removes it from the input and places it in a buffer called TOKEN. (Actually, TOKEN may be a pair of pointers to the beginning and end of the matched string in the input buffer itself.). L then executes the action Ai. It is possible, that non of the regular expressions denoting the tokens matches any prefix of the input. In that case, an error has occurred, and L transfers control to some error handling routine. It is also possible that two or more patterns match the same longest prefix of the remaining input. If that is the case, L will break the tie in favor of that token which came first in the list of translation rules. Example: Let us consider the collection of tokens defined in fig (5), LEX program is shown in fig (24) AUXILIARY DEFINITION Letter= A?B?…?Z Digit= 0?1?…?9 TRANSLATION RULES BEGIN {return 1} END {return 2} IF {return 3} THEN {return 4} ELSE {return 5} letter(letter?digit)* {LEX VAL:= INSTALL( ); return 6} digit+ {LEX VAL:= INSTALL( ); return 7} < {LEX VAL := 1; return 8} <= {LEX VAL := 2; return 8} = {LEX VAL := 3; return 8} < > {LEX VAL := 4; return 8} > {LEX VAL := 5; return 8} >= {LEX VAL := 6; return 8} Suppose the lexical analyzer resulting from the above rules is given input BEGIN followed by blank. Both the first and sixth pattern matches BEGIN, and no pattern matches a longer string. Since the pattern for keyword BEGIN precedes the pattern for identifier in the above list, the conflict is resolved in favor of the keyword. For another example, suppose <= are the first two characters read. While pattern < matches the first character, it is not the longest pattern matching a prefix of the input. Thus LEX s strategy that the longest prefix matching a pattern is selected makes it easy to resolve the conflict between < and <=, by choosing <= as the next tokens.
المادة المعروضة اعلاه هي مدخل الى المحاضرة المرفوعة بواسطة استاذ(ة) المادة . وقد تبدو لك غير متكاملة . حيث يضع استاذ المادة في بعض الاحيان فقط الجزء الاول من المحاضرة من اجل الاطلاع على ما ستقوم بتحميله لاحقا . في نظام التعليم الالكتروني نوفر هذه الخدمة لكي نبقيك على اطلاع حول محتوى الملف الذي ستقوم بتحميله .
|