Automata | Comp. Sc. Engg.

Orders For Strings

Introduction: There are two systematic methods available for listing the contents of sets of words. The first is known as the lexicographic order, or “order as in a dictionary” while the second is called the numeric order which lists strings by length-groups: all strings of a lower length are listed before a string of a higher length is listed. Within a length-group, the strings are listed in lexicographic order. We now present some details as well as examples.

Lexicographic order for strings

In this order, given two strings x and y, we compare the characters comprising x and y position by position. We do this for each position i ∈ min(length(x), length(y)) (here, min finds the minimum of two numbers). For example, if we compare apple and pig, we compare for each position i ∈ 3, as pig is the shorter of the two and has length 3.

While comparing characters at a position, we compare with respect to the ASCII code of the character at that position.3

Given all this, for strings x and y such that x _= y, x said to be strictly before y in lexicographic order, written x <lex y, under one of two circumstances:

  1. If there exists a position j ∈ min(length(x), length(y)) such that x[j] < y[j], and for all i ∈ j (meaning, for positions 0 . . . (j −1)), x[i] = y[i]. For example, aaabb <lex aaaca, aaron <lex abate, apple <lex pig, and pig <lex putter.
  2. For all positions j ∈ min(length(x), length(y)), we have x[j] = y[j], and length(x) < length(y).

Definition: Define string identity, x =lex y, to hold for two identical strings x and y. Also define ≤lex to be <lex ∪ =lex, and >lex to be thecomplement of ≤lex.

Definition: Given two strings x and y, x is before y in lexicographic order iff x ≤lex y.

Let us now go through some examples of lexicographic order:

• ε <lex aa follows from condition 2 above. This is because there are no positions within a length of 0, and so the only condition to be satisfied is length(x) < length(y), which is true for x _= y and

x = ε. Since ε <lex aa, we also have ε ≤lex aa. • a <lex aa <lex aaa <lex aaaa . . .. Hence, these are also in the ≤lexrelation.

3 The ASCII code totally orders all the characters available on modern keyboards. Refer to the web or a book on hardware design to know what this total order is. We are taking this more pragmatic route of relying on ASCII codes to keep our definitions concrete.

1. If length(x) = length(y), then x <numeric y exactly when x <lex y.

2. Otherwise, if length(x) < length(y), then x <numeric y.

3. Otherwise (length(x) > length(y)) x >numeric y.

As before, we define <numeric to be strictly before in numeric order, and

≤numeric = <numeric ∪ =lex

to be before in numeric order, or simply “in numeric order.”

The numeric order of listing the strings over alphabet {a, b} will yield

the sequence

ε, a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, bba, bbb, aaaa, . . . .