误差复印现象:Derivation of the Baum
Next: Estimating the means of Up: Bayesian Learning 2: EM Previous: The EM Algorithm
Derivation of the Baum-Welch algorithm for HMMs
Consider our formulation of HMMs as before, i.e.
Let Z = f(A,B) be a random variable such that xijk = f(A,B) denotes the joint event consisting of the transition from state i to j and the observation of the vk from state j. So the observed output is completely specified by a sequence
Then the parameter of interest,
We wish to find a
The expression on the left is called Baum's auxiliary function. We have already seen that if the above inequality holds, then P'(O) > P(O). So maximizing the LHS wrt.
To do this, we first find a stationary point of the LHS subject to the constraints
Since X completely specifies O (recall that xijk also specifies that symbol vk is observed in state j),
If cijk denotes the number of times xijk occurs in X, then
Hence,
We can also satisfy ourselves that
and so the stationary point obtained by setting the first derivative to zero is indeed a maximum.
Substituting in (17) and setting to zero,
If we now write
where Ki can be interpreted as a normalizing constant. The RHS can now be summed over each of the times
The RHS above is just the reestimated probability of xijk because prior to normalization, the sum yields the expected frequency of the xijk computed using the current values of the pijk.
We can now proceed to calculate each pijk' as above. The set of all such pijk's constitutes our
The EM theorem tells us that if we continue in this fashion, then after each iteration
- 1.
- either the reestimated
is more likely than the original in the sense that or - 2.
- we have reached a stationary point of the likelihood function w at which
This is an algorithm, because it is iterative. There is an expectation step (aka estimation step) and a maximization step (aka modification step) in each iteration.
Note that the algorithm is not guaranteed to converge at the global maximum. But it has been found to yield good results in practice.
Next: Estimating the means of Up: Bayesian Learning 2: EM Previous: The EM Algorithm Anand Venkataraman
1999-09-16