Viterbi Decoding with TensorFlow

The main concepts of the algorithm:

  • Initial probabilities: determine how likely the process is to start in each possible state. In the context of hidden Markov models, initial probabilities (or initial distributions) give us an idea of ​​what the odds are of the system being in each possible initial state. For example, if there are three possible states S1, S2 And S3the initial probabilities can be represented as , P(S2) And P(S3)where the sum of all probabilities is equal to 1.

  • Transition matrix: a matrix in which each cell a_{ij} represents the probability of transition from the state S_i in a state S_j. The transition matrix depicts the dynamics of the system by showing how the probabilities of state changes depend on the current state. In hidden Markov models, if the system is in a state S_i at a point in time t That a_{ij} shows the probability of transition to a state S_j at a point in time t+1.

  • Emission matrix: the matrix specifies the probability of observing each possible event in each state. The emission probabilities describe how the observed data depends on the hidden states. For example, if O_k represents an observable event, then the probability that this event will occur when the system is in a state S_iis denoted as b_i(O_k). Each row of this matrix contains the probabilities of observing various events for a particular state, and the sum of all the probabilities in a row is 1.

Let's say we're observing a sequence of weather conditions, and there are hidden states sunnyo and it is rainyThe initial probabilities can be as follows: P(\text{sunny}) = 0.6 ) And P(\text{rainy}) = 0.4. The transition matrix can show that if it is sunny today, then tomorrow will also be sunny with probability 0.7and it's likely to rain 0.3and vice versa. The emission matrix may indicate that if it is sunny today, then with probability 0.9 we observe the sun, and with probability 0.1 – rain, and vice versa.

Implementation of Viterbi Decoding Function in TF

Let's install TensorFlow itself:

pip install tensorflow

You will also need numpy.

At the start, we will create a cache that will be used to store intermediate probability values:

import tensorflow as tf

class HMM:
    def __init__(self, initial_prob, trans_prob, obs_prob):
        self.initial_prob = tf.constant(initial_prob, dtype=tf.float64)
        self.trans_prob = tf.constant(trans_prob, dtype=tf.float64)
        self.obs_prob = tf.constant(obs_prob, dtype=tf.float64)
        self.viterbi = tf.Variable(initial_value=tf.zeros([self.initial_prob.shape[0], None], dtype=tf.float64), trainable=False)

    def get_emission(self, obs_idx):
        return tf.gather(self.obs_prob, obs_idx)

Next, we declare an operator to update the Viterbi cache at each time step:

class HMM:
    # предыдущий код инициализации

    def decode_op(self, obs_idx):
        emissions = self.get_emission(obs_idx)
        transitions = tf.matmul(self.viterbi, tf.transpose(emissions))
        weighted_transitions = transitions * self.trans_prob
        viterbi_update = tf.reduce_max(weighted_transitions, axis=1)
        return tf.reshape(viterbi_update, tf.shape(self.viterbi))

Back pointers are needed to track the most likely transition paths between states:

class HMM:
    # предыдущий код инициализации и decode_op

    def backpt_op(self):
        back_transitions = tf.matmul(self.viterbi, tf.ones([self.viterbi.shape[1], self.trans_prob.shape[0]], dtype=tf.float64))
        weighted_back_transitions = back_transitions * self.trans_prob
        return tf.argmax(weighted_back_transitions, axis=1)

We combine all the parts to implement the complete function:

import numpy as np

# пример данных HMM
initial_prob = np.array([0.6, 0.4])
trans_prob = np.array([[0.7, 0.3], [0.4, 0.6]])
obs_prob = np.array([[0.5, 0.5], [0.1, 0.9]])

# пример наблюдений
observations = np.array([0, 1, 1])

# инициализация модели
hmm = HMM(initial_prob, trans_prob, obs_prob)

# создание сессии TensorFlow
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    # инициализация кэша Витерби начальными вероятностями
    viterbi_init = sess.run(hmm.initial_prob)
    
    backpts = np.ones((hmm.trans_prob.shape[0], len(observations)), dtype="int32") * -1
    
    for t in range(1, len(observations)):
        feed_dict = {hmm.viterbi: viterbi_init}
        viterbi, backpt = sess.run([hmm.decode_op(observations[t]), hmm.backpt_op()], feed_dict=feed_dict)
        backpts[:, t] = backpt
    
    # вычисление наиболее вероятной последовательности состояний
    tokens = [np.argmax(viterbi)]
    for i in range(len(observations) - 1, 0, -1):
        tokens.append(backpts[tokens[-1], i])
    tokens.reverse()

    print('Most likely hidden states are', tokens)

The Viterbi algorithm is not limited to theoretical problems and is used in telecommunications, bioinformatics, natural language processing, etc.

All current methods and tools of DS and ML can be mastered in OTUS online courses: in the catalog you can see the list of all programs, and in the calendar — sign up for open lessons.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *