Markov Chain Monte Carlo for fun and profit
đČ âïž đ đ§Ș
Introduction¶
Hello and welcome to the documentation for MCMCFF! These notebooks will guide you through the process of writing a medium-sized scientific software project, discussing the decision and trade-offs made along the way.
Setting up your environment¶
Itâs strongly encouraged that you follow along this series of notebooks in an environment where you can run the cells yourself and change them. You can either clone this git repository and run the cells in a Python environment on your local machine,
git clone https://github.com/ImperialCollegeLondon/ReCoDE_MCMCFF mcmc
cd mcmc
or if for some reason you canât do that (because you are on a phone or tablet for instance) you can instead open this notebook in binder
I would also suggest you set up a Python environment just for this project. You can use your preferred method to do this, but I will recommend Anaconda
because itâs both what I currently use and what is recommended by Imperial.
#make a new conda environment from the specification in environment.yml
conda env create --file environment.yml
#activate the environment
conda activate mcmc
If youâd prefer to keep this environment nicely stored away in this repository, you can save in a folder called env by doing
conda env create --prefix env --file environment.yml
conda activate ./env #you have to run this in the enclosing directory of course!
Further Reading on how to set up an environment:
Software Carpentry More practical.
The Turing Way Discusses why we use environments.
Essential Software Engineering for Researchers Quick overview.
The Problem¶
So without further ado lets talk about the problem weâll be working on, you donât necessarily need to understand the full details of this to learn the important lessons, but I will give a quick summary here. We want to simulate a physical model called the Ising model, which is famous in physics because itâs about the simplest thing you can come up with that displays a phase transition, a special kind of shift between two different behaviours.
Iâm going to weave exposition and code here so donât mind if I just take a moment to impor some packages and do some housekeeping:
import numpy as np
import matplotlib.pyplot as plt
# This loads some custom styles for matplotlib
import json, matplotlib
with open("../_static/matplotlibrc.json") as f:
matplotlib.rcParams.update(json.load(f))
np.random.seed(
42
) # This makes our random numbers reproducible when the notebook is rerun in order
Weâre going to be working with arrays of numbers, so it will make sense to work with NumPy
, and weâll also want to plot things, the standard choice for this is Matplotlib
, though there are other options, pandas
and Plotly
being notable ones.
Let me quickly plot something to aid the imagination:
state = np.random.choice([-1, 1], size=(100, 100))
def show_state(state, ax=None):
if ax is None:
f, ax = plt.subplots()
ax.matshow(state, cmap="Greys", vmin=-1, vmax=1)
ax.set(xticks=[], yticks=[])
show_state(state)
In my head, the Ising model is basically all about peer pressure. Youâre a tiny creature, and you live in a little world where you can only be one of two things, up/down, left/right, in/out doesnât matter.
But what does matter is that youâre doing the same thing as youâre neighbours. Weâre going to visualise this with images like the above, representing the two different camps, though at the moment what Iâve plotted is random, thereâs no peer pressure going on yet.
The way that a physicist would quantify this peer pressure is to assign a number to each state, lower numbers meaning more of the little creatures are doing the same thing as their neighbours. Weâll call this the Energy, because physicists always call things Energy, thatâs just what we do.
To calculate the energy what weâre going to do is look at all the pixels/creatures, and for each one, we look at the four neighbours to the N/E/S/W, every time we find a neighbour that agrees, weâll subtract 1 from our total and every time we find neighbours that disagree weâll add 1 to our total. Creatures at the edges will simply have fewer neighbours to worry about.
Iâll show you what the equation for this looks like, but donât worry too much about it, the word description should be enough to write some code. If we assign the ith creature the label \(s_i = \pm1\) then the energy is
Ok letâs do some little tests, letâs make the all up, all down and random state and see if we can compute their energies.
all_up = np.ones([100, 100])
all_down = -np.ones([100, 100])
random = np.random.choice([-1, 1], size=(100, 100))
from matplotlib.image import imread
custom = (
1 - 2 * imread("data/test_state.png")[:, :, 0]
) # load a 100x100 png, take the red channel, remap 0,1 to -1,1
states = [all_up, all_down, random, custom]
f, axes = plt.subplots(ncols=4, figsize=(20, 5))
for ax, state in zip(axes, states):
show_state(state, ax=ax)
If you stare at the first two long enough youâll realise we can figure out the energy of all_up and all_down without writing any code at all:
So we know that for the first two:
And for the random case we can make a pretty good guess that it should be zero on average. And the last we will just use to as a testcase.
def E_prediction_all_the_same(L):
return -(4 * (L - 2) ** 2 + 12 * (L - 2) + 8)
L = 100
print(f"For L = {L}, We predict E = {E_prediction_all_the_same(L)}")
For L = 100, We predict E = -39600
Exercise 1: Write a function to compute the energy of a state¶
See if you can write a function that calculates the energy and reproduces all the above cases.
# you are welcome to solve this in any way you like, I've just filled out a very simple way to do it
def energy(state):
E = 0
N, M = state.shape
for i in range(N):
for k in range(M):
# your code goes here
pass
return E / (N * M)
expected_values = [
E_prediction_all_the_same(100),
E_prediction_all_the_same(100),
0,
"???",
]
f, axes = plt.subplots(ncols=2, nrows=2, figsize=(10, 10))
axes = axes.flatten()
for ax, state, exp_value in zip(axes, states, expected_values):
show_state(state, ax=ax)
ax.text(
0,
-0.1,
f"Expected Value: {exp_value}, Your value: {energy(state)}",
transform=ax.transAxes,
)
Solution 1¶
def energy(state):
E = 0
N, M = state.shape
for i in range(N):
for j in range(M):
# handle the north and south neighbours
for di in [1, -1]:
if 0 <= (i + di) < N:
E -= state[i, j] * state[i + di, j]
# handle the east and west neighbours
for dj in [1, -1]:
if 0 <= (j + dj) < M:
E -= state[i, j] * state[i, j + dj]
return E
expected_values = [
E_prediction_all_the_same(100),
E_prediction_all_the_same(100),
0,
"???",
]
f, axes = plt.subplots(ncols=2, nrows=2, figsize=(10, 10))
axes = axes.flatten()
for ax, state, exp_value in zip(axes, states, expected_values):
show_state(state, ax=ax)
ax.text(
0,
-0.1,
f"Expected Value: {exp_value}, Your value: {energy(state)}",
transform=ax.transAxes,
)
Itâs a bit tricky to know what to do with the random value, letâs try running it 100 times and see what we get:
L = 100 # How large the system should be
N = 100 # How many random samples to use
energies = [energy(np.random.choice([-1, 1], size=(L, L))) for _ in range(N)]
plt.hist(energies)
print(f"mean = {np.mean(energies)}, standard error = {np.std(energies) / np.sqrt(N)}")
mean = -0.24, standard error = 29.819071481184658
If you run this a few times youâll see the mean is usually within a few standard errors of 0, which gives us some confidence. In the testing section we will discuss how we might go about doing automated tests of random variables like this.
Making it a little faster¶
This project is not intended to focus on optimising for performance, but it is worth putting a little effort into making this function faster so that we can run experiments more quickly later.
The main thing that slows us down here is that weâve written a âtight loopâ in pure python, the energy function is just a loop over the fundamental operation:
E -= state[i,j] * state[i+di, j]
which in theory only requires a few memory load operations, a multiply, an add and a store back to memory (give or take). However, because Python is such a dynamic language, it will have to do extra things like check the type and methods of state
and E
, invoke their array access methods object.__get__
, etc. We call this extra work overhead.
In most cases the ratio of overhead to actual computation is not too bad, but here because the fundamental computation is so simple itâs likely the overhead accounts for much more of the overall time.
In scientific python like this there are usually two main options for reducing the overhead:
Using Arrays¶
One way is we work with arrays of numbers and operations defined over those arrays such as sum
, product
etc. NumPy
is the canonical example of this in Python, but many machine learning libraries are essentially doing a similar thing. We rely on the library to implement the operations efficiently and try to chain those operations together to achieve what we want. This imposes some limitations on the way we can write our code.
Using Compilation¶
The alternative is that we convert our Python code into a more efficient form that incurs less overhead. This requires a compilation or transpilation step and imposes a different set of constraints on the code.
Itâs a little tricky to decide which of the two approaches will work best for a given problem. My advice would be to have some familiarity with both but ultimately to use what makes your development experience the best, since youâll likely spend more time writing the code than you will waiting for it to run!
Exercise 2: Write a faster version of energy(state)
¶
You can use numpy
, numba
, cython
, or anything else, by what factor can you beat the naive approach? Numba is probably the easiest here.
def test_energy_function(energy_function):
assert np.all(energy_function(state) == energy(state) for state in states)
def time_energy_function(energy_function):
return [energy_function(state) for state in states]
def your_faster_energy_function(state):
return energy(
state
) # <-- replace this with your implementation and compare how fast it runs!
print("Naive baseline implementation")
test_energy_function(
energy
) # this should always pass because it's just comparing to itself!
naive = %timeit -o -n 10 time_energy_function(energy)
print("\nYour version")
test_energy_function(your_faster_energy_function)
yours = %timeit -o -n 10 time_energy_function(your_faster_energy_function)
print(f"Your speedup: {naive.best/yours.best :.0f}x !")
Naive baseline implementation
80.9 ms ± 2.94 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Your version
80.2 ms ± 4.99 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Your speedup: 1x !
Solution 2¶
def test_energy_function(energy_function):
return [energy_function(state) for state in states]
from numba import jit
@jit(nopython=True)
def numba_energy(state):
E = 0
N, M = state.shape
for i in range(N):
for j in range(M):
# handle the north and south neighbours
for di in [1, -1]:
if 0 <= (i + di) < N:
E -= state[i, j] * state[i + di, j]
# handle the east and west neighbours
for dj in [1, -1]:
if 0 <= (j + dj) < M:
E -= state[i, j] * state[i, j + dj]
return E
def numpy_energy(state):
E = -np.sum(state[:-1, :] * state[1:, :]) - np.sum(state[:, :-1] * state[:, 1:])
return 2 * E
print("Naive baseline implementation")
naive = %timeit -o -n 10 time_energy_function(energy)
print("\nNumba version")
test_energy_function(numba_energy)
numba = %timeit -n 10 -o time_energy_function(numba_energy)
print(f"Numba Speedup: {naive.best/numba.best :.0f}x !")
print("\nNumpy version")
test_energy_function(numpy_energy)
numpy = %timeit -n 10 -o time_energy_function(numpy_energy)
print(f"Numpy Speedup: {naive.best/numpy.best :.0f}x !")
Naive baseline implementation
81.2 ms ± 3.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Numba version
212 ”s ± 20.5 ”s per loop (mean ± std. dev. of 7 runs, 10 loops each)
Numba Speedup: 420x !
Numpy version
153 ”s ± 37.5 ”s per loop (mean ± std. dev. of 7 runs, 10 loops each)
Numpy Speedup: 576x !
While writing the above faster versions I realised two things, first there was a bug in my code! I had written
for dj in [1,-1]:
if 0 <= (j + dj) < M:
E -= state[i,j] * state[i+di, j]
where I of course meant to have:
for dj in [1,-1]:
if 0 <= (j + dj) < M:
E -= state[i,j] * state[i, j+dj]
I found this while writing the numpy version because the two functions were not giving the same results, I didnât pick it up from writing the numba code because as you can see I just copied the implementation over. So the first lesson is that simple test cases donât always catch even relatively obvious bugs.
The second thing was that even my naive function was doing more work than it needed to! Because if the symmetry of how two neighbours talks to each other, we can rewrite the code as:
def energy2(state):
E = 0
N, M = state.shape
for i in range(N):
for j in range(M):
# handle the north and south neighbours
if 0 <= (i + 1) < N:
E -= state[i, j] * state[i + 1, j]
# handle the east and west neighbours
if 0 <= (j + 1) < M:
E -= state[i, j] * state[i, j + 1]
return 2 * E
print("\nImproved Naive version")
energy2(states[0]) # run the function once to let numba compile it before timing it
e2 = %timeit -o test_energy_function(energy2)
print(f"Speedup: {naive.best/e2.best :.0f}x !")
Improved Naive version
34.6 ms ± 147 ”s per loop (mean ± std. dev. of 7 runs, 10 loops each)
Speedup: 2x !
Conclusion¶
So far weâve discussed the problem we want to solve, written a little code, tested it a bit and made some speed improvements.
In the next notebook we will package the code up into a little python package, this has two big benefits when using the code:
I wonât have to redefine the energy function we just wrote in the next notebook
It will help with testing and documenting our code later
%load_ext watermark
%watermark -n -u -v -iv -w -g -r -b -a "Thomas Hodson" -gu "T_Hodson"
Author: Thomas Hodson
Github username: T_Hodson
Last updated: Mon Jul 18 2022
Python implementation: CPython
Python version : 3.9.12
IPython version : 8.4.0
Git hash: 03657e08835fdf23a808f59baa6c6a9ad684ee55
Git repo: https://github.com/ImperialCollegeLondon/ReCoDE_MCMCFF.git
Git branch: main
matplotlib: 3.5.1
numpy : 1.21.5
json : 2.0.9
Watermark: 2.3.1