Viterbi Algorithm
Introduction: Given a sequence, x, what is the most probable path in the model that generates x? This problem of decoding can be solved using the Viterbi algorithm.
Many paths can generate x. We want to find the most probable one, π*, that is, the path that maximizes the probability of having generated x. This is π*=argmaxπ P(π|x). It so happens that this is equal to argmaxpπ(x, π). For a sequence of length L, there are |Q|L possible paths, where |Q| is the number of states in the model. It is infeasible to enumerate all of these possible paths! Once again, we resort to a dynamic programming technique to solve the problem.
At each step along the way, the Viterbi algorithm tries to find the most probable path leading from one symbol of the sequence to the next. We define vl(i) to be the probability of the most probable path accounting for the first i symbols of x and ending in state l. To find π*, we need to compute maxkvk(L), the probability of the most probable path accounting for all of the sequence and ending in the end state. The probability, vl(i), is
vl(i) = el(xi) _maxk(vl(k)akl),
which states that the most probable path that generates x1 : : :xi and ends in state l has to emit xi in state xl (hence, the emission probability, el(xi)) and has to contain the most probable path that generates x1 : : :xi-1 and ends in state k, followed by a transition from state k to state l (hence, the transition probability, akl ). Thus, we can compute vk(L) for any state, k, recursively to obtain the probability of the most probable path.
The Viterbi algorithm is shown in Figure 8.12. Step 1 performs initialization. Every path starts at the begin state (0) with probability 1. Thus, for i=0, we have v0 (0)=1, and the probability of starting at any other state is 0. Step 2 applies the recurrence formula for i = 1 to L. At each iteration, we assume that we know the most likely path for x1 : : : xi-1 that ends in state k, for all k 2 Q. To find the most likely path to the i-th state from there, we maximize vk(i-1)akl over all predecessors k ∈Q of l. To obtain vl(i), we multiply by el(xi) since we have to emit xi from l. This gives us the first formula in step 2. The values vk(i) are stored in a Q X L dynamic programming matrix. We keep pointers (ptr) in this matrix so that we can obtain the path itself. The algorithm terminates in step 3, where finally, we have maxkvk(L).We enter the end state of 0 (hence, the transition probability, ak0) but do not emit a symbol. The Viterbi algorithm runs in O(|Q|2|L|) time. It is more efficient than the forward algorithm because it investigates only the most probable path and avoids summing over all possible paths.
Algorithm: Viterbi algorithm. Find the most probable path that emits the sequence of symbols, x.
Input:
- A hidden Markov model, defined by a set of states, Q, that emit symbols, and by transition and emission probabilities;
- x, a sequence of symbols.
Output:The most probable path, π*.
Method:
(1) Initialization (i = 0): v0(0) = 1, vk(0) = 0 for k > 0
(2) Recursion (i = 1: : :L):
- vl (i) = el (xi)maxk(vk(i-1)akl )
- ptri(l) = argmaxk(vk(i-1)akl )
(3) Termination:
- P(x; π*) = maxk(vk(L)ak0)
- π*L = argmaxk(vk(L)ak0)