Markov Chain Monte Carlo for fun and profit

🎲 ⛓️ πŸ‘‰ πŸ§ͺ

Packaging It UpΒΆ

Before we proceed with writing any more code I want to put what we already have in python file and make it into an installable module. This will be useful both for importing code into these notebooks and for testing later.

Directory StructureΒΆ

More info:

Before we can do any testing, it is best practice to structure and then package your code up as a python project. You don’t have to do it like this, but it carries with it the benefit that many other tutorials expect you to do it like this, and generally you want to reduce friction for yourself later.

Like all things programming, there are many opinions about how python projects should be structured, as I write this the structure of this repository is this: (This is the lightly edited output of the tree command if you’re interested)

.
β”œβ”€β”€ CITATION.cff # This file describes how to cite the work contained in this repository.
β”œβ”€β”€ LICENSE # Outlines what legal rights you have to use this software.
β”œβ”€β”€ README.md # You are here!
β”œβ”€β”€ docs
β”‚   β”œβ”€β”€ ... #Files to do with making the documentation
β”‚   └── learning
β”‚       └── #The Jupyter notebooks that form the main body of this project
β”‚
β”œβ”€β”€ pyproject.toml # Machine readable information about the MCFF package
β”œβ”€β”€ readthedocs.yml # Tells readthedocs.com how to build the documentation
β”œβ”€β”€ environment.yml # A specification for building a conda environment including all the dependencies
β”œβ”€β”€ setup.cfg # Machine readable information about the MCFF package
β”œβ”€β”€ src
β”‚   └── MCFF # The actual code!
β”‚
└── tests # automated tests for the code

It looks pretty intimidating! But let’s quickly go through it: at the top level of most projects you’ll find on GitHub (and elsewhere) there are a group of files that describe the project as a whole or provide key project information - not all projects will have all of these files and, indeed, there a variety of other files that you may also see so this is an example of some of the more important files:

  • README.md - An intro to the project

  • LICENSE - The software license that governs this project, there are a few standard ones people use.

  • environment.yml (or alternatives) - this lists what Python packages the project needs in a standard format (other languages have equivalents).

  • CITATION.cff This is the new standard way to describe how a work should be cited, v useful for academic software.

Then below that you will usually have directories breaking the project up into main categories, here I have src/ and docs/learning/.

Inside src/ we have a standard Python package directory structure.

PackagingΒΆ

There are a few things going on here, our actual code lives in MCFF/ which is wrapped up inside a src folder, the src thing is a convention related to pytests, check Packaging for pytest if you want the gory details.

Inside MCFF/ we have our files that will become submodules so that in python we will be able to do things like:

from MCFF.ising_model import all_up_state, all_down_state, random_state
from MCFF import mcmc #once we've written this that is!

pyproject.toml and setup.cfg are the current way to describe the metadata about a python package like how it should be installed and who the author is etc, but typically you just copy the standard layouts and build from there. The empty __init__.py file flags that this folder is a python module.

pyproject.toml:

[build-system]
requires = ["setuptools>=4.2"]
build-backend = "setuptools.build_meta"

setup.cfg

[metadata]
name = MCFF
version = 0.0.1
author = Tom Hodson
author_email = tch14@ic.ac.uk
description = A small example package
long_description = file: README.md
long_description_content_type = text/markdown
url = https://github.com/TomHodson/MCMC_for_fun_and_profit
classifiers =
    Programming Language :: Python :: 3
    License :: OSI Approved :: The 3-Clause BSD License
    Operating System :: OS Independent

[options]
package_dir =
    = src
packages = find:
python_requires = >=3.6
install_requires =
    numpy == 1.21
    scipy == 1.7
    matplotlib == 3.5
    numba == 0.55

[options.extras_require]
dev =
    pytest == 7.1       # Testing
    pytest-cov == 3.0   # For Coverage testing
    hypothesis == 6.29  # Property based testing
    pre-commit == 2.20
    jupyterlab == 3.4.3
    ipykernel == 6.9  # Allows this conda environment to show up automatically in Jupyter Lab
    watermark == 2.3  # Generates a summary of package version for use inside Jupyter Notebooks

docs =
    sphinx == 5.0.0
    myst-nb == 0.16.0

[options.packages.find]
where = src

Phew, that was a lot. Python packaging has been evolving a lot over the years and the consequence is there is a lot of out of date advice and there are many other ways to do this. You’re best bet to figure out what the current best practice is to consult official sources like python.org.

Once all that is set up, from the top level of the project you can run:

pip install --editable ".[dev,docs]"

The dot means we should install MCFF from the current directory and --editable means to do it as an editable package so that we can edit the files in MCFF and not have to reinstall. This is really useful for development. [dev,docs] means we also want to install the packages that are needed to do development of this repository and to build the documentation, both those things will become relevant later!

In the next notebook, we will finally write the Markov Chain Monte Carlo function! And if you found yourself frustrated while dealing with python packaging, you can at least take solace in the fact that you’re not the only one:

An xkcd comic with a diagram of p values, saying that small ones are highly significant and giving humorous excuses for why larger ones are still intersting
%load_ext watermark
%watermark -n -u -v -iv -w -g -r -b -a "Thomas Hodson" -gu "T_Hodson"
Author: Thomas Hodson

Github username: T_Hodson

Last updated: Mon Jul 18 2022

Python implementation: CPython
Python version       : 3.9.12
IPython version      : 8.4.0

Git hash: 03657e08835fdf23a808f59baa6c6a9ad684ee55

Git repo: https://github.com/ImperialCollegeLondon/ReCoDE_MCMCFF.git

Git branch: main

Watermark: 2.3.1