GuideJuly 21, 2025

In the Loop: Markov Models part 1

This episode of In the Loop explains how Markov chains and Markov decision processes work, laying the groundwork for modeling decisions in AI and robotics.

Transcript

Hi, I’m Micaela Kaplan, the ML Evangelist at HumanSignal, and this is In the Loop—the series where we help you stay in the loop with all things data science and AI.

This week, we’re diving into Markov models, which form the foundation of many machine learning techniques, including reinforcement learning, which we covered last time.

Markov models are named after Russian mathematician Andrey Markov. There are four common types, but today we’re focusing on the fully observable ones: Markov chains and Markov decision processes. We’ll cover hidden Markov models next time.

The Markov Assumption

The Markov assumption states that the future state of a system depends only on its current state—not the entire history. Formally, the probability of state qᵢ = a given q₁...qₙ is the same as the probability of qᵢ = a given just qᵢ₋₁. In simpler terms: the last state is all that matters when predicting the next one.

Markov Chains

Markov chains are the simplest kind of Markov model. They represent a sequence of random variables (or "states") and the probability of transitioning from one state to another.

Formally, a Markov chain includes:

Q: A set of possible states (q₁, q₂, …, qₙ)

A: A transition matrix, where aᵢⱼ is the probability of moving from state i to state j

π: An initial distribution, representing the probability of starting in each state

The sum of all outgoing transition probabilities from a given state, and the sum of all π values, must each equal 1.

Let’s look at a simple weather example:

States (Q): Hot, Cold, Warm

Transition matrix (A): Probabilities for each state-to-state shift

Initial probabilities (π): e.g., 0.3 for Hot, 0.4 for Cold, 0.3 for Warm

To find the probability of a sequence like [Hot → Warm → Hot → Cold], multiply:
0.3 × 0.2 × 0.3 × 0.2 = 0.0036

Markov Decision Processes (MDPs)

Markov decision processes (MDPs) build on Markov chains by incorporating actions and rewards. They're used in systems like robotics or gameplay AI, where agents must make decisions with uncertain outcomes.

MDPs include:

States: Where the system is

A: Set of possible actions the agent can take

R: Reward function, defining the payoff for each action

π (policy): A strategy for choosing actions that maximize cumulative reward

A transition model defines probabilities of ending up in various states after taking a particular action. For instance, an agent might get +1 for reaching a goal or –1 for stepping into danger.

To optimize decisions, agents use the Bellman equation. The algorithm repeatedly updates state values and policies until it converges on an optimal strategy.

That’s a wrap on Markov chains and Markov decision processes—essential tools for modeling decision-making in fully observable environments.

Next time, we’ll dive into hidden Markov models, where the true state of the system is only partially visible.

Thanks for staying in the loop.

Want more? Check out our other episodes—and don’t forget to like and subscribe to stay in the loop.