Hello my blog readers,

This post is to introduce a new Python package samplepy. This package was written to simplify sampling tasks that so often creep-up in machine learning. The package implements Importance, Rejection and Metropolis-Hastings sampling algorithms.

**samplepy** has a very simple API. The package can be installed with pip by simply running *pip install samplepy*. Once installed, you can use it to sample from any univariate distribution as following (showing rejection sampling use):

from samplepy import Rejection import matplotlib.pyplot as plt import numpy as np # define a unimodal function to sample under f = lambda x: 2.0*np.exp(-2.0*x) # instantiate Rejection sampling with f and required interval rej = Rejection(f, [0.01, 3.0]) # create a sample of 10K points sample = rej.sample(10000, 1) # plot the original function and the created sample set x = np.arange(0.01, 3.0, (3.0-0.01)/10000) fx = f(x) figure, axis = plt.subplots() axis.hist(sample, normed=1, bins=40) axis2 = axis.twinx() axis2.plot(x, fx, 'g', label="f(x)=2.0*exp(-2*x)") plt.legend(loc=1) plt.show()

The three sampling method (i.e. Rejection, Importance and MH) are quite different and will achieve slightly different results for the same function. Performance is another important difference factor, with Metropolis-Hastings probably being the slowest. Let’s compare how the three sampling algorithm deliver on a bi-modal univariate function:

from samplepy import Rejection, Importance, MH import matplotlib.pyplot as plt import numpy as np f = lambda x: np.exp(-1.0*x**2)*(2.0+np.sin(5.0*x)+np.sin(2.0*x)) interval = [-3.0, 3.0] rej = Rejection(f, interval) # instantiate Rejection sampling with f and interval sample = rej.sample(10000, 1) # create a sample of 10K points x = np.arange(interval[0], interval[1], (interval[1]-interval[0])/10000) fx = f(x) figure, axis = plt.subplots() axis.hist(sample, normed=1, bins=40) axis2 = axis.twinx() axis2.plot(x, fx, 'g', label="Rejection") plt.legend(loc=1) plt.show() mh = MH(f,interval) sample = mh.sample(20000, 100, 1) # Make sure we have enough points in the sample! figure, axis = plt.subplots() axis.hist(sample, normed=1, bins=40) axis2 = axis.twinx() axis2.plot(x, fx, 'g', label="MH") plt.legend(loc=1) plt.show() imp = Importance(f, interval) sample = imp.sample(10000, 0.0001, 0.0010) # create a sample where essentially no extra importance is given to any quantile figure, axis = plt.subplots() axis.hist(sample, normed=1, bins=40) axis2 = axis.twinx() axis2.plot(x, fx, 'g', label="Importance") plt.legend(loc=1) plt.show()

Hopefully this gives you enough examples to get you started using samplepy!