# Baum-welch Algorithm

**Introduction:** Baum-Welch Algorithm explains how we can determine the parameters of a hidden Markov model that will best explain the sequences.

When the paths for the training sequences are unknown, there is no longer a direct closed-form equation for the estimated parameter values. An iterative procedure must be used, like the Baum-Welch algorithm. The Baum-Welch algorithm is a special case of the EM algorithm (Section 7.8.1), which is a family of algorithms for learning probabilistic models in problems that involve hidden states.

The Baum-Welch algorithm is shown in Figure 8.13. The problem of finding the optimal transition and emission probabilities is intractable. Instead, the Baum-Welch algorithm finds a locally optimal solution. In step 1, it initializes the probabilities to an arbitrary estimate. It then continuously re-estimates the probabilities (step 2) until convergence (i.e., when there is very little change in the probability values between iterations). The re-estimation first calculates the expected transmission and emission probabilities. The transition and emission probabilities are then updated to maximize the likelihood of the expected values.

In summary, Markov chains and hidden Markov models are probabilistic models in which the probability of a state depends only on that of the previous state. They are particularly useful for the analysis of biological sequence data, whose tasks include evaluation, decoding, and learning. We have studied the forward, Viterbi, and Baum-Welch algorithms. The algorithms require multiplying many probabilities, resulting in very

**Algorithm:** Baum-Welch algorithm. Find the model parameters (transition and emission probabilities) that best explain the training set of sequences.

**Input: **A training set of sequences.

**Output:**

- Transition probabilities, akl ;
- Emission probabilities, ek(b);

**Method:**

(1) initialize the transmission and emission probabilities;

(2) iterate until convergence

(2.1) calculate the expected number of times each transition or emission is used

(2.2) adjust the parameters to maximize the likelihood of these expected values