Languages

Introduction: Basic data structure or input to grammars or automaton is strings. Strings are defined over an alphabet which is finite. Alphabet may vary depending upon the application. Elements of an alphabet are called symbols. Usually we denote the basic alphabet set either as Σ or T. For example, the following are a few examples of an alphabet set.

Σ1 = {a,b}

Σ2 = {0,1,2}

Σ3 = {0,1,2,3,4,5,6,7,8,9}.

A string or word is a finite sequence of symbols from that alphabet, usually written as concatenated symbols and not separated by gaps or commas. For example, if Σ = {a, b}, a string abbab is a string or word over Σ. If w is a string over an alphabet Σ, then the length of w written as len(w) or |w| is the number of symbols it contains. If |w| = 0, then w is called as empty string denoted either as λ or ε.

For any word w, w ε = εw = w. For any string w = a1 ... an of length n, the reverse of w is written as wR which is the string anan−1 ... a1, where each symbol ai belongs to the basic alphabet Σ. A string z that is appearing consecutively within another string w is called a substring or subword of w. For example, aab is a substring of baabb.

The set of all strings over an alphabet Σ is denoted by Σ* which includes the empty string ε. For example, for Σ = {0, 1}, Σ* = {ε, 0, 1, 00, 01, 10, 11, ...}. Note that Σ* is a countably infinite set. Also, Σn denotes the set of all strings over Σ whose length is n. Hence, Σ* = Σ0 ∪Σ1 ∪Σ2 ∪Σ3 ... and Σ = Σ1 ∪Σ2 ∪Σ3 .... Subsets of Σ* are called languages.

For example, if Σ = {a, b}, the following are languages over Σ

L1 = {ε, a, b}

L2 = {ab, aabb, aaabbb, ...}

L3 = {w ε Σ*| number of a’s and number of b’s in w are equal.}

In the above example, L1 is finite, and L2 and L3 are infinite languages. φ denotes an empty language.

We have the following inductive definition of Σ and Σ*, where Σ is any basic alphabet set.

Definitio. 1: Let Σ be any alphabet set. Σ is a set of non-empty strings over Σ defined as follows:

Basis: If a ∊Σ, then a ∊Σ .

Induction: If α ∊Σ and a ∊Σ, αa, aα are in Σ .

No other element belongs to Σ .

Clearly, the set Σ contains all strings of length n, n ≥ 1.

Example :Let Σ = {0,1,2}. Then

Σ = {0, 1, 2, 00, 01, 02, 10, 11, 12, 20, 21, 22, ... }

Suppose we wish to include ‘ε’ in Σ , we modify the above definition as given below.

Definition 2: Let Σ be any alphabet set. Σ* is defined as follows:

Basis: ε ∊Σ*. , Induction: If α ∊Σ*, α∊Σ, then aα, α a ∊Σ*.

No other element is in Σ*.

Since languages are sets, one can define the set-theoretic operations of union, intersection, difference, and complement in the usual fashion.