Home Experiments Math Science Fair Projects Mathematics Jokes and Archimedes Mathematics Resources Mathematicians Warning!
 
 


Viterbi Algorithm & Viterbi Decoder




 


Experiments Home
Mathematics
Viterbi Algorithm & Viterbi Decoder





Mathematics Science Fair Projects Home

  • Statistics & Probability
  • Geometry & Trigo
  • Applied Mathematics
  • Number Theory
  • Learning & Cognition
  • Miscellany


  • Scientists and Inventors

    Scientists and Inventors















    Scientists and Inventors

    Scientists and Inventors
    Viterbi Algorithm & Viterbi Decoder

    The Viterbi algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden states – called the Viterbi path – that results in a sequence of observed events, especially in the context of hidden Markov models. The forward algorithm is a closely related algorithm for computing the probability of a sequence of observed events. These algorithms form a subset of information theory.

    A viterbi decoder (see below) uses the Viterbi algorithm for decoding a bitstream that has been encoded using Forward error correction based on a Convolutional code.

    The algorithm makes a number of assumptions. First, both the observed events and hidden events must be in a sequence. This sequence often corresponds to time. Second, these two sequences need to be aligned, and an observed event needs to correspond to exactly one hidden event. Third, computing the most likely hidden sequence up to a certain point t must depend only on the observed event at point t, and the most likely sequence at point t − 1. These assumptions are all satisfied in a first-order hidden Markov model.

    The terms "Viterbi path" and "Viterbi algorithm" are also applied to related dynamic programming algorithms that discover the single most likely explanation for an observation. For example, in stochastic parsing a dynamic programming algorithm can be used to discover the single most likely context-free derivation (parse) of a string, which is sometimes called the "Viterbi parse".

    The Viterbi algorithm was conceived by Andrew Viterbi as an error-correction scheme for noisy digital communication links, finding universal application in decoding the convolutional codes used in both CDMA and GSM digital cellular, dial-up modems, satellite, deep-space communications, and 802.11 wireless LANs. It is now also commonly used in speech recognition, keyword spotting, computational linguistics, and bioinformatics. For example, in speech-to-text (speech recognition), the acoustic signal is treated as the observed sequence of events, and a string of text is considered to be the "hidden cause" of the acoustic signal. The Viterbi algorithm finds the most likely string of text given the acoustic signal.

    Contents

    Viterbi Algorithm
    Viterbi Decoder

    Overview

    First, we elaborate on the assumptions listed above. The Viterbi algorithm operates on a state machine assumption. That is, at any system we are modeling is in some state. There are a finite number of states, however large, that can be list "survivor path". This is a fundamental assumption of the algorithm because the algorithm will examine all possible paths leading to a state and only keep the one most likely. This way the algorithm does not have to keep track of all possible paths, only one per state.

    A second key assumption is that a transition from a previous state to a new state is marked by an incremental metric, usually a number. This transition is computed from the event. The third key assumption is that the events are cumulative over a path in some sense, usually additive. So the crux of the algorithm is to keep a number for each state. When an event occurs, the algorithm examines moving forward to a new set of states by combining the metric of a possible previous state with the incremental metric of the transition due to the event and chooses the best. The incremental metric associated with an event depends on the transition possibility from the old state to the new state. For example in data communications, it may be possible to only transmit half the symbols from an odd numbered state and the other half from an even numbered state. Additionally, in many cases the state transition graph is not fully connected. A simple example is a car that has 3 states — forward, stop and reverse — and a transition from forward to reverse is not allowed. It must first enter the stop state. After computing the combinations of incremental metric and state metric, only the best survives and all other paths are discarded. There are modifications to the basic algorithm which allow for a forward search in addition to the backwards one described here.

    Path history must be stored. In some cases, the search history is finite because the state machine at the encoder starts in a known state and there is sufficient memory to keep all the paths. In other cases, a programmatic solution must be found for limited resources: one example is convolutional encoding, where the decoder must truncate the history at a depth large enough to keep performance to an acceptable level. Although the Viterbi algorithm is very efficient and there are modifications that reduce the computational load, the memory requirements tend to remain constant.

    A concrete example

    Assume you have a friend who lives far away and to whom you talk daily over the telephone about what he did that day. Your friend is only interested in three activities: walking in the park, shopping, and cleaning his apartment. The choice of what to do is determined exclusively by the weather on a given day. You have no definite information about the weather where your friend lives, but you know general trends. Based on what he tells you he did each day, you try to guess what the weather must have been like.

    You believe that the weather operates as a discrete Markov chain. There are two states, "Rainy" and "Sunny", but you cannot observe them directly, that is, they are hidden from you. On each day, there is a certain chance that your friend will perform one of the following activities, depending on the weather: "walk", "shop", or "clean". Since your friend tells you about his activities, those are the observations. The entire system is that of a hidden Markov model (HMM).

    You know the general weather trends in the area, and what your friend likes to do on average. In other words, the parameters of the HMM are known. You can write them down in the Python programming language:

    states = ('Rainy', 'Sunny')
    
    observations = ('walk', 'shop', 'clean')
    
    start_probability = {'Rainy': 0.6, 'Sunny': 0.4}
    
    transition_probability = {
       'Rainy' : {'Rainy': 0.7, 'Sunny': 0.3},
       'Sunny' : {'Rainy': 0.4, 'Sunny': 0.6},
       }
    
    emission_probability = {
       'Rainy' : {'walk': 0.1, 'shop': 0.4, 'clean': 0.5},
       'Sunny' : {'walk': 0.6, 'shop': 0.3, 'clean': 0.1},
       }
    

    In this piece of code, start_probability represents your belief about which state the HMM is in when your friend first calls you (all you know is that it tends to be rainy on average). The particular probability distribution used here is not the equilibrium one, which is (given the transition probabilities) actually approximately {'Rainy': 0.571, 'Sunny': 0.429}. The transition_probability represents the change of the weather in the underlying Markov chain. In this example, there is only a 30% chance that tomorrow will be sunny if today is rainy. The emission_probability represents how likely your friend is to perform a certain activity on each day. If it is rainy, there is a 50% chance that he is cleaning his apartment; if it is sunny, there is a 60% chance that he is outside for a walk.

    You talk to your friend three days in a row and discover that on the first day he went for a walk, on the second day he went shopping, and on the third day he cleaned his apartment. You have two questions: What is the overall probability of this sequence of observations? And what is the most likely sequence of rainy/sunny days that would explain these observations? The first question is answered by the forward algorithm; the second question is answered by the Viterbi algorithm. These two algorithms are structurally so similar (in fact, they are both instances of the same abstract algorithm) that they can be implemented in a single function:

    def forward_viterbi(y, X, sp, tp, ep):
       T = {}
       for state in X:
           ##          prob.      V. path  V. prob.
           T[state] = (sp[state], [state], sp[state])
       for output in y:
           U = {}
           for next_state in X:
               total = 0
               argmax = None
               valmax = 0
               for source_state in X:
                   (prob, v_path, v_prob) = T[source_state]
                   p = ep[source_state][output] * tp[source_state][next_state]
                   prob *= p
                   v_prob *= p
                   total += prob
                   if v_prob > valmax:
                       argmax = v_path + [next_state]
                       valmax = v_prob
               U[next_state] = (total, argmax, valmax)
           T = U
       ## apply sum/max to the final states:
       total = 0
       argmax = None
       valmax = 0
       for state in X:
           (prob, v_path, v_prob) = T[state]
           total += prob
           if v_prob > valmax:
               argmax = v_path
               valmax = v_prob
       return (total, argmax, valmax)
    

    The function forward_viterbi takes the following arguments: y is the sequence of observations, e.g. ['walk', 'shop', 'clean']; X is the set of hidden states; sp is the start probability; tp are the transition probabilities; and ep are the emission probabilities.

    The algorithm works on the mappings T and U. Each is a mapping from a state to a triple (prob, v_path, v_prob), where prob is the total probability of all paths from the start to the current state, v_path is the Viterbi path up to the current state, and v_prob is the probability of the Viterbi path up to the current state. The mapping T holds this information for a given point t in time, and the main loop constructs U, which holds similar information for time t+1. Because of the Markov property, information about any point in time prior to t is not needed.

    The algorithm begins by initializing T to the start probabilities: the total probability for a state is just the start probability of that state; and the Viterbi path to a start state is the singleton path consisting only of that state; the probability of the Viterbi path is the same as the start probability.

    The main loop considers the observations from y in sequence. It is loop invariant because T contains the correct information up to but excluding the point in time of the current observation. The algorithm then computes the triple (prob, v_path, v_prob) for each possible next state. The total probability of a given next state, total is obtained by adding up the probabilities of all paths reaching that state. More precisely, the algorithm iterates over all possible source states. For each source state, T holds the total probability of all paths to that state. This probability is then multiplied by the emission probability of the current observation and the transition probability from the source state to the next state. The resulting probability prob is then added to total. The probability of the Viterbi path is computed in a similar fashion, but instead of adding across all paths one performs a discrete maximization. Initially the maximum value valmax is zero. For each source state, the probability of the Viterbi path to that state is known. This too is multiplied with the emission and transition probabilities and replaces valmax if it is greater than its current value. The Viterbi path itself is computed as the corresponding argmax of that maximization, by extending the Viterbi path that leads to the current state with the next state. The triple (prob, v_path, v_prob) computed in this fashion is stored in U and once U has been computed for all possible next states, it replaces T, thus ensuring that the loop invariant holds at the end of the iteration.

    In the end another summation/maximization is performed (this could also be done inside the main loop by adding a pseudo-observation after the last real observation).

    In the running example, the forward/Viterbi algorithm is used as follows:

    def example():
       return forward_viterbi(observations,
                              states,
                              start_probability,
                              transition_probability,
                              emission_probability)
    print example()
    

    This reveals that the total probability of ['walk', 'shop', 'clean'] is 0.033612 and that the Viterbi path is ['Sunny', 'Rainy', 'Rainy', 'Rainy']. The Viterbi path contains four states because the third observation was generated by the third state and a transition to the fourth state. In other words, given the observed activities, it was most likely sunny when your friend went for a walk and then it started to rain the next day and kept on raining.

    When implementing this algorithm, it should be noted that many languages use Floating Point arithmetic - as p is small, this may lead to underflow in the results. A common technique to avoid this is to take the logarithm of the probabilities and use it throughout the computation, the same technique used in the Logarithmic Number System. Once the algorithm has terminated, an accurate value can be obtained by performing the appropriate exponentiation.

    Extensions

    With the algorithm called iterative Viterbi decoding one can find the subsequence of an observation that matches best (on average) to a given HMM. Iterative Viterbi decoding works by iteratively invoking a modified Viterbi algorithm, reestimating the score for a filler until convergence.

    An alternate algorithm, the Lazy Viterbi algorithm, has been proposed recently. This works by not expanding any nodes until it really needs to, and usually manages to get away with doing a lot less work (in software) than the ordinary Viterbi algorithm for the same result - however, it is not so easy to parallelize in hardware.

    See also

    References

    • J Feldman, I Abou-Faycal and M Frigo. A Fast Maximum-Likelihood Decoder for Convolutional Codes.

    External links

    Viterbi Decoder

    A viterbi decoder uses the Viterbi algorithm for decoding a bitstream that has been encoded using Forward error correction based on a Convolutional code.

    There are other algorithms for decoding a convolutionally encoded stream (for example, the Fano algorithm). The Viterbi algorithm is the most resource-consuming, but it does the maximum likelihood decoding. It is most often used for decoding convolutional codes with constraint lengths k<=10, but values up to k=15 are used in practice.

    Viterbi decoding was developed by Andrew J. Viterbi and published in the paper "Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm", IEEE Transactions on Information Theory, Volume IT-13, pages 260-269, in April, 1967.

    There are both hardware (in modems) and software implementations of a Viterbi decoder.

    Contents

    Hardware implementation

    A common way to implement a hardware viterbi decoder
    A common way to implement a hardware viterbi decoder

    A hardware viterbi decoder for basic (not perforated) code usually consists of the following major blocks:

    • Branch metric unit (BMU)
    • Path metric unit (PMU)
    • Traceback unit (TBU)

    Branch metric unit (BMU)

    A sample implementation of a branch metric unit
    A sample implementation of a branch metric unit

    A branch metric unit's function is to calculate branch metrics, which are normed distances between every possible symbol in the code alphabet, and the received symbol.

    There are hard decision and soft decision viterbi decoders. The decoder of the first type receives a simple bitstream on its input. In this case a Hamming distance is used as a metric. The second-type decoder receives a bitstream containing information about the reliability of each received symbol. Usually this information is encoded as follows (the table is shown for 3-bit input):

    value meaning
    000 strongest 0
    001 relatively strong 0
    010 relatively weak 0
    011 weakest 0
    100 weakest 1
    101 relatively weak 1
    110 relatively strong 1
    111 strongest 1

    Of course, it is not the only way to encode reliablility data.

    The squared Euclidean distance is used as a metric for soft decision decoders.

    Path metric unit (PMU)

    A sample implementation of a path metric unit for a specific K=4 decoder
    A sample implementation of a path metric unit for a specific K=4 decoder

    A path metric unit summarizes branch metrics to get metrics for 2K − 1 paths, one of which can eventually be chosen as optimal. Every clock it makes 2K − 1 decisions, throwing off wittingly nonoptimal paths. The results of these decisions are written to the memory of a traceback unit.

    The core elements of a PMU are ACS (Add-Compare-Select) units. The way in which they are connected between themselves is defined by a specific code's trellis diagram.

    Since branch metrics are always \ge 0, there must be an additional circuit preventing metric counters from overflow (it isn't shown on the image). An alternate method that eliminates the need to monitor the path metric growth is to allow the path metrics to "roll over", to use this method it is necessary to make sure the path metric accumulators contain enough bits to prevent the "best" and "worst" values from coming within 2(n-1) of each other. The compare circuit is essentially unchanged.

    A sample implementation of an ACS unit
    A sample implementation of an ACS unit

    It is possible to monitor the noise level on the incoming bit stream by monitoring the rate of growth of the "best" path metric, a simpler way to do this is to monitor a single location or "state" and watch it pass "upward" through say four discrete levels within the range of the accumulator. As it passes upward through each of these thresholds, a counter is incremented that reflects the "noise" present on the incoming signal.

    Traceback unit (TBU)

    A sample implementation of a traceback unit
    A sample implementation of a traceback unit

    Back-trace unit restores an (almost) maximum-likelihood path from the decisions made by PMU. Since it does it in inverse direction, a viterbi decoder comprises a FILO (first-in-last-out) buffer to reconstruct a correct order.

    Note that the implementation shown on the image requires double frequency. There are some tricks that eliminate this requirement.

    Implementation issues

    Euclidean metric computation

    The squared norm distance (l2) distance between the received and the actual symbols in the code alphabet may be further simplified into a linear sum/difference form, which makes it less computationally intensive.

    Consider a 1/2 convolutional coder, which generates 2 bits (00, 01, 10 or 11) for every input bit (1 or 0). These Return-to-Zero signals are translated into a Non-Return-to-Zero form shown alongside.

    code alphabet vector mapping
    00 1, 1
    01 1, -1
    10 -1, 1
    11 -1, -1

    Each received symbol may be represented in vector form as vr = {r0, r1}, where r0 and r1 are soft decision values, whose magnitudes signify the joint reliability of the received vector, vr.

    Every symbol in the code alphabet may, likewise, be represented by the vector vi = {±1, ±1}.

    The actual computation of the Euclidean distance metric is:

      D = ||vr - vi||2
        
        = ||vr||2 - 2 (vr . vi) + ||vi||2
    

    Each square term is a normed distance, depicting the energy of the symbol. For ex., the energy of the symbol vi = {±1, ±1} may be computed as

      ||vi||2 = (±1)2 + (±1)2 = 2
    

    Thus, the energy term of all symbols in the code alphabet is constant (at (normalized) value 2).

    The Add-Compare-Select (ACS) operation compares the metric distance between the received symbol ||vr|| and any 2 sumbols in the code alphabet whose paths merge at a node in the corresponding trellis, ||vi(0)|| and ||vi(1)||. This is equivalent to comparing

      D0 = ||vr||2 - 2 (vr . vi(0)) + ||vi(0)||2
    

    and

      D1 = ||vr||2 - 2 (vr . vi(1)) + ||vi(1)||2
    

    But, from above we know that the energy of vi is constant (equal to (normalized) value of 2), and the energy of vr is the same in both cases. This reduces the comparison to a minima function between the 2 (middle) dot product terms,

         min((- 2 (vr . vi(0)), (- 2 (vr . vi(1)))
    
      =  max((vr . vi(0)), (vr . vi(1)))
    

    since a min operation on negative numbers may be interpreted as an equivalent max operation on positive quantities.

    Each dot product term may be expanded as

      max(({r0, r1} . {±1, ±1}), ({r0, r1} . {±1, ±1}))
    
    = max(± r0 ± r1, ± r0 ± r1)
    

    where, the signs of each term depend on symbols, vi(0) and vi(1), being compared. Thus, the squared Euclidean metric distance calculation to compute the branch metric may be performed with a simple add/subtract operation.

    Traceback

    The general approach to traceback is to accumulate path metrics for up to five times the constraint length (5 * (K - 1)), find the node with the largest accumulated cost, and begin traceback from this node.

    However, computing the node which has accumulated the largest cost (either the largest or smallest integral path metric) involves finding the maxima or minima of several (usually 2K-1) numbers, which may be time consuming when implemented on embedded hardware systems.

    Most communication systems employing Viterbi decoding involving data packets of fixed sizes, with a fixed bit/byte pattern either at the beginning or/and at the end of the data packet. By using the known bit/byte pattern as reference, the start node may be set to a fixed value, thereby obtaining a perfect Maximum Likelihood Path during traceback.

    Limitations

    Generally speaking, the viterbi decoder doesn't produce an exact maximum-likelihood stream due to these limitations:

    • Quantizing of an input signal, branch and path metrics.
    • Limited traceback length.

    Perforated codes

    A hardware viterbi decoder of perforated codes is commonly implemented in such a way:

    • A deperforator, which transforms the input stream into the stream which looks like an original (imperforated) stream with ERASE marks at the places where bits were erased.
    • A basic viterbi decoder understanding these ERASE marks (that is, not using them for branch metric calculation).

    Software implementation

    See Viterbi algorithm.

    One of the most time-consuming operations is an ACS butterfly, which is usually implemented using an assembly language and appropriate instruction set extensions (such as SSE2) to speed up the decoding time.

    Applications

    The Viterbi decoding algorithm is widely used in the following areas:

    External links

    • [1] Details on Viterbi decoding, as well as a bibliography.
    • [2] r=1/6 k=15 coding for the Cassini mission to Saturn
    • [3] GPL Viterbi decoder software

    This article is licensed under the GNU Free Documentation License. It uses material from Wikipedia Encyclopedia article "Viterbi Algorithm"

    Scientists and Inventors    Scientists and Inventors    Scientists and Inventors   

    My Dog Kelly

    Site Map ♣ About Us ♣ Patent-Invent ♣ Free Theses, Dissertations & Patents

    Comments and inquiries could be addressed to:
    webmaster@julianTrubin.com


    Last updated: April 2008
    Copyright © 2003-2008 Julian Rubin