Markov Chain Monte Carlo for fun and profit

đŸŽČ ⛓ 👉 đŸ§Ș

Introduction¶

Hello and welcome to the documentation for MCMCFF! These notebooks will guide you through the process of writing a medium-sized scientific software project, discussing the decision and trade-offs made along the way.

Setting up your environment¶

It’s strongly encouraged that you follow along this series of notebooks in an environment where you can run the cells yourself and change them. You can either clone this git repository and run the cells in a Python environment on your local machine,

git clone https://github.com/ImperialCollegeLondon/ReCoDE_MCMCFF mcmc
cd mcmc

or if for some reason you can’t do that (because you are on a phone or tablet for instance) you can instead open this notebook in binder

I would also suggest you set up a Python environment just for this project. You can use your preferred method to do this, but I will recommend Anaconda because it’s both what I currently use and what is recommended by Imperial.

#make a new conda environment from the specification in environment.yml
conda env create --file environment.yml

#activate the environment
conda activate mcmc

If you’d prefer to keep this environment nicely stored away in this repository, you can save in a folder called env by doing

conda env create --prefix env --file environment.yml
conda activate ./env #you have to run this in the enclosing directory of course!

Further Reading on how to set up an environment:

The Problem¶

So without further ado lets talk about the problem we’ll be working on, you don’t necessarily need to understand the full details of this to learn the important lessons, but I will give a quick summary here. We want to simulate a physical model called the Ising model, which is famous in physics because it’s about the simplest thing you can come up with that displays a phase transition, a special kind of shift between two different behaviours.

I’m going to weave exposition and code here so don’t mind if I just take a moment to impor some packages and do some housekeeping:

import numpy as np
import matplotlib.pyplot as plt

# This loads some custom styles for matplotlib
import json, matplotlib

with open("../_static/matplotlibrc.json") as f:
    matplotlib.rcParams.update(json.load(f))

np.random.seed(
    42
)  # This makes our random numbers reproducible when the notebook is rerun in order

We’re going to be working with arrays of numbers, so it will make sense to work with NumPy, and we’ll also want to plot things, the standard choice for this is Matplotlib, though there are other options, pandas and Plotly being notable ones.

Let me quickly plot something to aid the imagination:

state = np.random.choice([-1, 1], size=(100, 100))


def show_state(state, ax=None):
    if ax is None:
        f, ax = plt.subplots()
    ax.matshow(state, cmap="Greys", vmin=-1, vmax=1)
    ax.set(xticks=[], yticks=[])


show_state(state)
../_images/39dec5fca902f2d7ba736be715f358e6e26b3200f3d80ddcfb2a1c23ef470148.png

In my head, the Ising model is basically all about peer pressure. You’re a tiny creature, and you live in a little world where you can only be one of two things, up/down, left/right, in/out doesn’t matter.

But what does matter is that you’re doing the same thing as you’re neighbours. We’re going to visualise this with images like the above, representing the two different camps, though at the moment what I’ve plotted is random, there’s no peer pressure going on yet.

The way that a physicist would quantify this peer pressure is to assign a number to each state, lower numbers meaning more of the little creatures are doing the same thing as their neighbours. We’ll call this the Energy, because physicists always call things Energy, that’s just what we do.

To calculate the energy what we’re going to do is look at all the pixels/creatures, and for each one, we look at the four neighbours to the N/E/S/W, every time we find a neighbour that agrees, we’ll subtract 1 from our total and every time we find neighbours that disagree we’ll add 1 to our total. Creatures at the edges will simply have fewer neighbours to worry about.

I’ll show you what the equation for this looks like, but don’t worry too much about it, the word description should be enough to write some code. If we assign the ith creature the label \(s_i = \pm1\) then the energy is

\[E = \sum_{(i,j)} s_i s_j\]

Ok let’s do some little tests, let’s make the all up, all down and random state and see if we can compute their energies.

all_up = np.ones([100, 100])
all_down = -np.ones([100, 100])
random = np.random.choice([-1, 1], size=(100, 100))

from matplotlib.image import imread

custom = (
    1 - 2 * imread("data/test_state.png")[:, :, 0]
)  # load a 100x100 png, take the red channel, remap 0,1 to -1,1

states = [all_up, all_down, random, custom]

f, axes = plt.subplots(ncols=4, figsize=(20, 5))
for ax, state in zip(axes, states):
    show_state(state, ax=ax)
../_images/8cd7a2f6b7510150dabe76c52d8de8034236936e4dce43a0369a8b66050e0765.png

If you stare at the first two long enough you’ll realise we can figure out the energy of all_up and all_down without writing any code at all:

So we know that for the first two:

\[E = \frac{1}{L^2} (4(L-2)^2 + 12(L-2) + 8)\]

And for the random case we can make a pretty good guess that it should be zero on average. And the last we will just use to as a testcase.

def E_prediction_all_the_same(L):
    return -(4 * (L - 2) ** 2 + 12 * (L - 2) + 8)


L = 100
print(f"For L = {L}, We predict E = {E_prediction_all_the_same(L)}")
For L = 100, We predict E = -39600

Exercise 1: Write a function to compute the energy of a state¶

See if you can write a function that calculates the energy and reproduces all the above cases.

# you are welcome to solve this in any way you like, I've just filled out a very simple way to do it
def energy(state):
    E = 0
    N, M = state.shape
    for i in range(N):
        for k in range(M):
            # your code goes here
            pass
    return E / (N * M)


expected_values = [
    E_prediction_all_the_same(100),
    E_prediction_all_the_same(100),
    0,
    "???",
]

f, axes = plt.subplots(ncols=2, nrows=2, figsize=(10, 10))
axes = axes.flatten()
for ax, state, exp_value in zip(axes, states, expected_values):
    show_state(state, ax=ax)
    ax.text(
        0,
        -0.1,
        f"Expected Value: {exp_value}, Your value: {energy(state)}",
        transform=ax.transAxes,
    )
../_images/4a4edc97ed3f58b0c81a2c18d6d28b8b9c7243f9b6dfdcedd61dfdbe8f5d596c.png

Solution 1¶

def energy(state):
    E = 0
    N, M = state.shape
    for i in range(N):
        for j in range(M):
            # handle the north and south neighbours
            for di in [1, -1]:
                if 0 <= (i + di) < N:
                    E -= state[i, j] * state[i + di, j]

            # handle the east and west neighbours
            for dj in [1, -1]:
                if 0 <= (j + dj) < M:
                    E -= state[i, j] * state[i, j + dj]

    return E


expected_values = [
    E_prediction_all_the_same(100),
    E_prediction_all_the_same(100),
    0,
    "???",
]

f, axes = plt.subplots(ncols=2, nrows=2, figsize=(10, 10))
axes = axes.flatten()
for ax, state, exp_value in zip(axes, states, expected_values):
    show_state(state, ax=ax)
    ax.text(
        0,
        -0.1,
        f"Expected Value: {exp_value}, Your value: {energy(state)}",
        transform=ax.transAxes,
    )
../_images/d1da1e5c9d3c9a15ed2a37368a7f197e5dfefbb793cbc6a197f27307e02a6855.png

It’s a bit tricky to know what to do with the random value, let’s try running it 100 times and see what we get:

L = 100  # How large the system should be
N = 100  # How many random samples to use
energies = [energy(np.random.choice([-1, 1], size=(L, L))) for _ in range(N)]
plt.hist(energies)
print(f"mean = {np.mean(energies)}, standard error = {np.std(energies) / np.sqrt(N)}")
mean = -0.24, standard error = 29.819071481184658
../_images/c0cb4b718c41b270649fd2b7fcd1e000122ad3a3e4b1bed71438d99201211f46.png

If you run this a few times you’ll see the mean is usually within a few standard errors of 0, which gives us some confidence. In the testing section we will discuss how we might go about doing automated tests of random variables like this.

Making it a little faster¶

This project is not intended to focus on optimising for performance, but it is worth putting a little effort into making this function faster so that we can run experiments more quickly later.

The main thing that slows us down here is that we’ve written a ‘tight loop’ in pure python, the energy function is just a loop over the fundamental operation:

E -= state[i,j] * state[i+di, j]

which in theory only requires a few memory load operations, a multiply, an add and a store back to memory (give or take). However, because Python is such a dynamic language, it will have to do extra things like check the type and methods of state and E, invoke their array access methods object.__get__, etc. We call this extra work overhead.

In most cases the ratio of overhead to actual computation is not too bad, but here because the fundamental computation is so simple it’s likely the overhead accounts for much more of the overall time.

In scientific python like this there are usually two main options for reducing the overhead:

Using Arrays¶

One way is we work with arrays of numbers and operations defined over those arrays such as sum, product etc. NumPy is the canonical example of this in Python, but many machine learning libraries are essentially doing a similar thing. We rely on the library to implement the operations efficiently and try to chain those operations together to achieve what we want. This imposes some limitations on the way we can write our code.

Using Compilation¶

The alternative is that we convert our Python code into a more efficient form that incurs less overhead. This requires a compilation or transpilation step and imposes a different set of constraints on the code.

It’s a little tricky to decide which of the two approaches will work best for a given problem. My advice would be to have some familiarity with both but ultimately to use what makes your development experience the best, since you’ll likely spend more time writing the code than you will waiting for it to run!

Exercise 2: Write a faster version of energy(state)¶

You can use numpy, numba, cython, or anything else, by what factor can you beat the naive approach? Numba is probably the easiest here.

def test_energy_function(energy_function):
    assert np.all(energy_function(state) == energy(state) for state in states)


def time_energy_function(energy_function):
    return [energy_function(state) for state in states]


def your_faster_energy_function(state):
    return energy(
        state
    )  # <-- replace this with your implementation and compare how fast it runs!


print("Naive baseline implementation")
test_energy_function(
    energy
)  # this should always pass because it's just comparing to itself!
naive = %timeit -o -n 10 time_energy_function(energy)

print("\nYour version")
test_energy_function(your_faster_energy_function)
yours = %timeit -o -n 10 time_energy_function(your_faster_energy_function)
print(f"Your speedup: {naive.best/yours.best :.0f}x !")
Naive baseline implementation
80.9 ms ± 2.94 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Your version
80.2 ms ± 4.99 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Your speedup: 1x !

Solution 2¶

def test_energy_function(energy_function):
    return [energy_function(state) for state in states]


from numba import jit


@jit(nopython=True)
def numba_energy(state):
    E = 0
    N, M = state.shape
    for i in range(N):
        for j in range(M):
            # handle the north and south neighbours
            for di in [1, -1]:
                if 0 <= (i + di) < N:
                    E -= state[i, j] * state[i + di, j]

            # handle the east and west neighbours
            for dj in [1, -1]:
                if 0 <= (j + dj) < M:
                    E -= state[i, j] * state[i, j + dj]

    return E


def numpy_energy(state):
    E = -np.sum(state[:-1, :] * state[1:, :]) - np.sum(state[:, :-1] * state[:, 1:])
    return 2 * E


print("Naive baseline implementation")
naive = %timeit -o -n 10 time_energy_function(energy)

print("\nNumba version")
test_energy_function(numba_energy)
numba = %timeit -n 10 -o time_energy_function(numba_energy)
print(f"Numba Speedup: {naive.best/numba.best :.0f}x !")

print("\nNumpy version")
test_energy_function(numpy_energy)
numpy = %timeit -n 10 -o time_energy_function(numpy_energy)
print(f"Numpy Speedup: {naive.best/numpy.best :.0f}x !")
Naive baseline implementation
81.2 ms ± 3.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Numba version
212 ”s ± 20.5 ”s per loop (mean ± std. dev. of 7 runs, 10 loops each)
Numba Speedup: 420x !

Numpy version
153 ”s ± 37.5 ”s per loop (mean ± std. dev. of 7 runs, 10 loops each)
Numpy Speedup: 576x !

While writing the above faster versions I realised two things, first there was a bug in my code! I had written

for dj in [1,-1]:
    if 0 <= (j + dj) < M:
        E -= state[i,j] * state[i+di, j]

where I of course meant to have:

for dj in [1,-1]:
    if 0 <= (j + dj) < M:
        E -= state[i,j] * state[i, j+dj]

I found this while writing the numpy version because the two functions were not giving the same results, I didn’t pick it up from writing the numba code because as you can see I just copied the implementation over. So the first lesson is that simple test cases don’t always catch even relatively obvious bugs.

The second thing was that even my naive function was doing more work than it needed to! Because if the symmetry of how two neighbours talks to each other, we can rewrite the code as:

def energy2(state):
    E = 0
    N, M = state.shape
    for i in range(N):
        for j in range(M):
            # handle the north and south neighbours
            if 0 <= (i + 1) < N:
                E -= state[i, j] * state[i + 1, j]

            # handle the east and west neighbours
            if 0 <= (j + 1) < M:
                E -= state[i, j] * state[i, j + 1]

    return 2 * E


print("\nImproved Naive version")
energy2(states[0])  # run the function once to let numba compile it before timing it
e2 = %timeit -o test_energy_function(energy2)
print(f"Speedup: {naive.best/e2.best :.0f}x !")
Improved Naive version
34.6 ms ± 147 ”s per loop (mean ± std. dev. of 7 runs, 10 loops each)
Speedup: 2x !

Conclusion¶

So far we’ve discussed the problem we want to solve, written a little code, tested it a bit and made some speed improvements.

In the next notebook we will package the code up into a little python package, this has two big benefits when using the code:

  1. I won’t have to redefine the energy function we just wrote in the next notebook

  2. It will help with testing and documenting our code later

%load_ext watermark
%watermark -n -u -v -iv -w -g -r -b -a "Thomas Hodson" -gu "T_Hodson"
Author: Thomas Hodson

Github username: T_Hodson

Last updated: Mon Jul 18 2022

Python implementation: CPython
Python version       : 3.9.12
IPython version      : 8.4.0

Git hash: 03657e08835fdf23a808f59baa6c6a9ad684ee55

Git repo: https://github.com/ImperialCollegeLondon/ReCoDE_MCMCFF.git

Git branch: main

matplotlib: 3.5.1
numpy     : 1.21.5
json      : 2.0.9

Watermark: 2.3.1