انت هنا الان : شبكة جامعة بابل > موقع الكلية > نظام التعليم الالكتروني > مشاهدة المحاضرة

simplification of CFG

الكلية كلية العلوم للبنات     القسم قسم الحاسبات     المرحلة 2
أستاذ المادة محمد عبيد مهدي الجبوري       3/18/2011 9:47:18 PM

Simplification of 

 

Context-Free Grammars

 

Simplifying Context-Free Grammars

 

 Methods for Transforming Grammars 

 

            o Substitution Rule 

 

            o Removing useless productions

 

            o Removing ?-productions

 

            o Removing unit productions

 

 Chomsky Normal Form 

 

 

Simplifying Context-Free Grammars; Normal Forms 

 

Preface

 

We found already in the last section on CFGs and Parsing, that parsing can be pretty time- and space-consuming. In order to construct good and efficient parsing algorithms, we consider now transformations of grammars, essentially simplifications of the form of productions, which lead to certain normal forms for grammars. These normal forms are better to deal with, and they allow more efficient parsing algorithms.

 

Methods for Transforming Grammars 

 

Substitution Rule 

 

If a grammar contains a production 

 

 A ? x1 B x2  with  A, B I V  and  A ? B

 

and 

 

 B ? y1 | y2 | ... | yn   (alternatives!)

 

then we can construct a new grammar G with a modified set of productions P , in which we replace the rules for A and B above with one set of rules:

 

 A ? x1 y1 x2  | x1 y2 x2  | ... | x1 yn x2

 

G and G are equivalent, i.e. the generate the same language.

 

Removing useless productions

 

Definition Useful Variable

 

A variable A is called useful if there is at least one wIL(G) such that

 

   S?*xAy ?* w     with  x, y I (V E T)*  

 

Otherwise, it is called useless.

 

A production is useless, if it involves a useless variable.

 

Note that the concept of useless variable includes the case that the variable does not lead to a terminal string, and the case that the variable cannot be reached from the start symbol. The elimination of useless variables and productions from a grammar (or the selection of those variables and productions, which are useful) proceeds in two phases:

 

A. Determine and select those variables, which can lead to a terminal string, and subsequently determine and select the related productions.

 

B. Determine and select those variables, which can be reached from the start symbol, and select related productions. 

 

Algorithm  - find useful variables and productions leading to wIT*

 

          1. Set V1 = ?

 

          2. Repeat until closure:

 

For every  AIV  with a production   

 

 A ? x1 x2 ... xn   and    xi I (V1 E T)

 

add A to V1.

 

3. Set P1 as the subset of productions from P, which contain only symbols from (V1 E T). 

 

4. This defines a new grammar G1 = (V1, T, S, P1)

 

 

Algorithm  - eliminate unreachable variables and related productions 

 

Draw a Dependency Graph for G1, above. The dependency graph is a graph with vertices labelled with variables and arcs going from vertex A to a vertex B, if the grammar contains a production A ? x1 B x2.

 

Eliminate from G1 all variables and related productions, which cannot be reached from the start symbol. This gives the new grammar G , which does not contain useless variables or productions.  

 

 

Removing l-productions

 

A production   A ? l

 

   is called a  l-production.

 

Definition Nullable Variable

 

A variable  A  is called nullable, if there is a derivation 

 

      A ?*  l

 

If G is a CFG and ??L(G), then there exists an equivalent grammar G without ?-productions. 

 

Proof  by construction 

 

This is also an algorithm to eliminate ?-productions from a given grammar.

 

First, we find all nullable variables of G, and put them into the set VN.

 

1. VN = ?

 

2. For all ?-production  A ? ? , add  A to VN.

 

3. Repeat until closure:

 

For all productions   

 

 B ? A1 A2 ... An   and    Ai I VN  for i = 1,...,n

 

add  B to VN.

 

4. Determine P based on the productions from P in the following way:

 

For each production  

 

  A ? x1 x2 ... xm  , m?1, and   xi I(VET) 

 

we add to P this production, as well as all productions, which we can construct from this production by substituting nullable variables xi with ?, in all possible combinations, which is equivalent to removing one or the other or all of those variables, i.e. for two nullable variable on the RHS, we remove one, remove the other, remove both, remove none. This leads to a bunch of productions, which we collect as P .  ?-productions A ? ? will  not be included in P .

 

This defines an equivalent grammar G = (V, T, S, P ), i.e.  L(G)=L(G ), without ?-productions.

 

Removing Unit Productions

 

Definition Unit-Production

 

A production of the form  A ? B  with A, BIV  is called a unit-production.

 

For any CFG G without ?-production, we can construct an equivalent grammar G without unit-productions.

 

1. We set up the dependency graph - for unit-productions only - and determine all dependencies 

 

  2. Then, we determine a subset P of P, which contains all non-unit      productions of P. 

 

3. This subset P we extend in the following way:

 

For all A, B with B dependent on A as above, and productions for A in P :

 

 A ? y1 | y2 |... | yn  

 

we add new productions to P :

 

 B ? y1 | y2 |... | yn 

 

This way, you eliminate unit-productions by substituting the right-side of a unit-production - even if it goes over several derivation steps  A  ?  ... ? B - with all possible options for A s productions from productions in P , i.e. not unit-productions.

 

You can imagine that this is the same as if you "inherit" the proper productions of A down to B. This does not change the language since B could have been generated by A without any additional terminals or variables around it (because A ?  ... ? B), and thus it is legitimate to consider B in this sense as equivalent to A.

 

Chomsky Normal Form 

 

Definition Chomsky Normal Form (CNF) 

 

A grammar is in Chomsky Normal Form (CNF), if it has only productions of the form:

 

A ? B C or 

 

A ? a

 

with A, B, C IV and aIT.

 

This means all rules are consisting of either a single terminal on the production, or they are binary, consisting of two variables.

 

 

 

DONE

 

Instructor: Mohamed U. Mahdi

 


المادة المعروضة اعلاه هي مدخل الى المحاضرة المرفوعة بواسطة استاذ(ة) المادة . وقد تبدو لك غير متكاملة . حيث يضع استاذ المادة في بعض الاحيان فقط الجزء الاول من المحاضرة من اجل الاطلاع على ما ستقوم بتحميله لاحقا . في نظام التعليم الالكتروني نوفر هذه الخدمة لكي نبقيك على اطلاع حول محتوى الملف الذي ستقوم بتحميله .