# Regular Expressions And Languages

**Introduction:** In this section, we: define several operations on languages; say what regular expressions are, what they mean, and what regular languages are; and begin to show how regular expressions can be processed by Forlan.

The union, intersection and set-difference operations on sets are also operations on languages, i.e., if L1;L2 2 Lan, then L1 [ L2, L1 \ L2 and L1 ¡ L2 are all languages. (Since L1;L2 2 Lan, we have that L1 µ §¤1 and L2 µ §¤2 , for alphabets §1 and §2. Let § = §1 [ §2, so that § is an alphabet, L1 µ §¤ and L2 µ §¤. Thus L1 [ L2, L1 \ L2 and L1 ¡L2 are all subsets of §¤, and so are all languages.)

The first new operation on languages is language concatenation. The concatenation of languages L1 and L2 (L1 @ L2) is the language

**f x1 @ x2 j x1 2 L1 and x2 2 L2 g:**

I.e., L1 @ L2 consists of all strings that can be formed by concatenating an element of L1 with an element of L2. For example,

{ab; abc} @ {cd; d} = {(ab)(cd); (ab)(d); (abc)(cd); (abc)(d)} = {abcd; abd; abccd}.

Concatenation of languages is associative: for all L1;L2;L3 2 Lan,

**(L1 @ L2) @ L3 = L1 @ (L2 @ L3):**

And, {%} is the identify for concatenation: for all L2 Lan,

**{%} @ L = L @ {%} = L:**

Furthermore, ; is the zero for concatenation: for all L2 Lan,

**; @ L = L @ ; = ;:**

We often abbreviate L1 @ L2 to L1L2.

Now that we know what language concatenation is, we can say what it means to raise a language to a power. We define the language Ln formed byraising language L to a power n 2 N by recursion on n:

**L0 = {%}; for all L2 Lan;**

**L ^{n}^{ 1}= LL^{n}; for all L2 Lan and n 2 N:**

We assign this operation higher precedence than concatenation, so that LLn means L(Ln) in the above definition. For example, we have that

{a; b}^{2}= {a; b}{a; b}^{1}= {a; b}{a; b}{a; b}^{0}

= {a; b}{a; b}{%} = {a; b}{a; b}

= {aa; ab; ba; bb}:

**Proposition:1 **For all L 2 Lan and n;m 2 N, L^{n}^{ }^{m }= L^{n}L^{m}.

**Proof**: An easy mathematical induction on n. The language L and the natural number m can be fixed at the beginning of the proof. 2Thus, if L ∈Lan and n ∈N, then

L^{n}^{ 1 }= L L^{n}(definition); and

L^{n}^{ 1 }= L^{n}L^{1}= L^{n }L (Proposition 1):

Another useful fact about language exponentiation is:

**Proposition 2: **For all w 2 Str and n ∈N, {w}^{n}= {w^{n}}.

**Proof:**By mathematical induction on n. 2

For example, we have that {01}^{4}= {(01)^{4}} = {01010101}.

Now we consider a language operation that is named after Stephen Cole Kleene, one of the founders of formal language theory. The Kleene closure(or just closure) of a language L (L*) is the language

∪{L^{n}j n ∈N}:

Thus, for all w,

w ∈L* iff w ∈A; for some A ∈{Ln j n ∈N}

iff w ∈Ln for some n ∈N:

Or, in other words:

- L* = L0 ∪L1 ∪L2 ∪. . .
- L* consists of all strings that can be formed by concatenating together some number (maybe none) of elements of L (the same element of L can be used as many times as is desired).