Regular Expressions And Languages

Introduction: In this section, we: define several operations on languages; say what regular expressions are, what they mean, and what regular languages are; and begin to show how regular expressions can be processed by Forlan.

The union, intersection and set-difference operations on sets are also operations on languages, i.e., if L1;L2 2 Lan, then L1 [ L2, L1 \ L2 and L1 ¡ L2 are all languages. (Since L1;L2 2 Lan, we have that L1 µ §¤1 and L2 µ §¤2 , for alphabets §1 and §2. Let § = §1 [ §2, so that § is an alphabet, L1 µ §¤ and L2 µ §¤. Thus L1 [ L2, L1 \ L2 and L1 ¡L2 are all subsets of §¤, and so are all languages.)

The first new operation on languages is language concatenation. The concatenation of languages L1 and L2 (L1 @ L2) is the language

f x1 @ x2 j x1 2 L1 and x2 2 L2 g:

I.e., L1 @ L2 consists of all strings that can be formed by concatenating an element of L1 with an element of L2. For example,

{ab; abc} @ {cd; d} = {(ab)(cd); (ab)(d); (abc)(cd); (abc)(d)} = {abcd; abd; abccd}.

Concatenation of languages is associative: for all L1;L2;L3 2 Lan,

(L1 @ L2) @ L3 = L1 @ (L2 @ L3):

And, {%} is the identify for concatenation: for all L2 Lan,

{%} @ L = L @ {%} = L:

Furthermore, ; is the zero for concatenation: for all L2 Lan,

; @ L = L @ ; = ;:

We often abbreviate L1 @ L2 to L1L2.

Now that we know what language concatenation is, we can say what it means to raise a language to a power. We define the language Ln formed byraising language L to a power n 2 N by recursion on n:

L0 = {%}; for all L2 Lan;

Ln 1= LLn; for all L2 Lan and n 2 N:

We assign this operation higher precedence than concatenation, so that LLn means L(Ln) in the above definition. For example, we have that

{a; b}2= {a; b}{a; b}1= {a; b}{a; b}{a; b}0

= {a; b}{a; b}{%} = {a; b}{a; b}

= {aa; ab; ba; bb}:

Proposition:1 For all L 2 Lan and n;m 2 N, Ln m = LnLm.

Proof: An easy mathematical induction on n. The language L and the natural number m can be fixed at the beginning of the proof. 2Thus, if L ∈Lan and n ∈N, then

Ln 1 = L Ln(definition); and

Ln 1 = LnL1= Ln L (Proposition 1):

Another useful fact about language exponentiation is:

Proposition 2: For all w 2 Str and n ∈N, {w}n= {wn}.

Proof:By mathematical induction on n. 2

For example, we have that {01}4= {(01)4} = {01010101}.

Now we consider a language operation that is named after Stephen Cole Kleene, one of the founders of formal language theory. The Kleene closure(or just closure) of a language L (L*) is the language

∪{Lnj n ∈N}:

Thus, for all w,

w ∈L*   iff w ∈A; for some A ∈{Ln j n ∈N}

            iff w ∈Ln for some n ∈N:

Or, in other words:

  • L* = L0 ∪L1 ∪L2 ∪. . .
  • L* consists of all strings that can be formed by concatenating together some number (maybe none) of elements of L (the same element of L can be used as many times as is desired).