Markov Chain Monte Carlo for fun and profit
π² βοΈ π π§ͺ
Packaging It UpΒΆ
Before we proceed with writing any more code I want to put what we already have in python file and make it into an installable module. This will be useful both for importing code into these notebooks and for testing later.
Directory StructureΒΆ
More info:
Before we can do any testing, it is best practice to structure and then package your code up as a python project. You donβt have to do it like this, but it carries with it the benefit that many other tutorials expect you to do it like this, and generally you want to reduce friction for yourself later.
Like all things programming, there are many opinions about how python projects should be structured, as I write this the structure of this repository is this: (This is the lightly edited output of the tree
command if youβre interested)
.
βββ CITATION.cff # This file describes how to cite the work contained in this repository.
βββ LICENSE # Outlines what legal rights you have to use this software.
βββ README.md # You are here!
βββ docs
β βββ ... #Files to do with making the documentation
β βββ learning
β βββ #The Jupyter notebooks that form the main body of this project
β
βββ pyproject.toml # Machine readable information about the MCFF package
βββ readthedocs.yml # Tells readthedocs.com how to build the documentation
βββ environment.yml # A specification for building a conda environment including all the dependencies
βββ setup.cfg # Machine readable information about the MCFF package
βββ src
β βββ MCFF # The actual code!
β
βββ tests # automated tests for the code
It looks pretty intimidating! But letβs quickly go through it: at the top level of most projects youβll find on GitHub (and elsewhere) there are a group of files that describe the project as a whole or provide key project information - not all projects will have all of these files and, indeed, there a variety of other files that you may also see so this is an example of some of the more important files:
README.md
- An intro to the projectLICENSE
- The software license that governs this project, there are a few standard ones people use.environment.yml
(or alternatives) - this lists what Python packages the project needs in a standard format (other languages have equivalents).CITATION.cff
This is the new standard way to describe how a work should be cited, v useful for academic software.
Then below that you will usually have directories breaking the project up into main categories, here I have src/
and docs/learning/
.
Inside src/
we have a standard Python package directory structure.
PackagingΒΆ
There are a few things going on here, our actual code lives in MCFF/
which is wrapped up inside a src
folder, the src
thing is a convention related to pytests, check Packaging for pytest if you want the gory details.
Inside MCFF/
we have our files that will become submodules so that in python we will be able to do things like:
from MCFF.ising_model import all_up_state, all_down_state, random_state
from MCFF import mcmc #once we've written this that is!
pyproject.toml
and setup.cfg
are the current way to describe the metadata about a python package like how it should be installed and who the author is etc, but typically you just copy the standard layouts and build from there. The empty __init__.py
file flags that this folder is a python module.
pyproject.toml:
[build-system]
requires = ["setuptools>=4.2"]
build-backend = "setuptools.build_meta"
setup.cfg
[metadata]
name = MCFF
version = 0.0.1
author = Tom Hodson
author_email = tch14@ic.ac.uk
description = A small example package
long_description = file: README.md
long_description_content_type = text/markdown
url = https://github.com/TomHodson/MCMC_for_fun_and_profit
classifiers =
Programming Language :: Python :: 3
License :: OSI Approved :: The 3-Clause BSD License
Operating System :: OS Independent
[options]
package_dir =
= src
packages = find:
python_requires = >=3.6
install_requires =
numpy == 1.21
scipy == 1.7
matplotlib == 3.5
numba == 0.55
[options.extras_require]
dev =
pytest == 7.1 # Testing
pytest-cov == 3.0 # For Coverage testing
hypothesis == 6.29 # Property based testing
pre-commit == 2.20
jupyterlab == 3.4.3
ipykernel == 6.9 # Allows this conda environment to show up automatically in Jupyter Lab
watermark == 2.3 # Generates a summary of package version for use inside Jupyter Notebooks
docs =
sphinx == 5.0.0
myst-nb == 0.16.0
[options.packages.find]
where = src
Phew, that was a lot. Python packaging has been evolving a lot over the years and the consequence is there is a lot of out of date advice and there are many other ways to do this. Youβre best bet to figure out what the current best practice is to consult official sources like python.org.
Once all that is set up, from the top level of the project you can run:
pip install --editable ".[dev,docs]"
The dot means we should install MCFF from the current directory and --editable
means to do it as an editable package so that we can edit the files in MCFF and not have to reinstall. This is really useful for development. [dev,docs]
means we also want to install the packages that are needed to do development of this repository and to build the documentation, both those things will become relevant later!
In the next notebook, we will finally write the Markov Chain Monte Carlo function! And if you found yourself frustrated while dealing with python packaging, you can at least take solace in the fact that youβre not the only one:
%load_ext watermark
%watermark -n -u -v -iv -w -g -r -b -a "Thomas Hodson" -gu "T_Hodson"
Author: Thomas Hodson
Github username: T_Hodson
Last updated: Mon Jul 18 2022
Python implementation: CPython
Python version : 3.9.12
IPython version : 8.4.0
Git hash: 03657e08835fdf23a808f59baa6c6a9ad684ee55
Git repo: https://github.com/ImperialCollegeLondon/ReCoDE_MCMCFF.git
Git branch: main
Watermark: 2.3.1