Running Parallel MCMC without specific R packages

Running multiple, distinct Markov chains that have a posterior density as their equilibrium density is useful to use convergence diagnostics such as the Gelman / Rubin diagnostic (Gelman & Rubin, 1992). In order to save time, one likes to run each Markov chain in parallel. There are a number of packages in R (parallel, snow,…) to run code in parallel. However, I find it more useful to write a script for a single chain and then dispatch it multiple times from the terminal. Here is an example with the MCMCpack: 

Save the following as “sim.R”:

substream <- as.numeric(commandArgs(trailingOnly = TRUE))


filename <- paste("posterior_", substream, ".Rdata" , sep="", collapse="")
out <- MCMCprobit(low~as.factor(race)+smoke, data=birthwt, 
 b0 = 0, B0 = 10, seed=list(NA, substream))

The first line grabs the seed argument from the terminal and pass it to MCMCprobit function which generates a single chain. The script can be executed from the terminal with four different substream seeds as follows:

Rscript --arch=x86_64 sim.R 1 & Rscript --arch=x86_64 sim.R 2 & Rscript --arch=x86_64 sim.R 3 & Rscript --arch=x86_64 sim.R 4

If you don’t have a 64-Bit machine, use “x86_32”. Provided that you have four cores, one core will be used for each of these jobs. Next, you can fuse the output together in an mcmc.list object and run the Gelman/Rubin diagnostic.

out1 <- out
out2 <- out
out3 <- out
out4 <- out

posteriors <- as.mcmc.list(list( as.mcmc(out1), as.mcmc(out2), 
 as.mcmc(out3), as.mcmc(out4)))

The great benefit of this approach is that the R code is less crammed with parallelization functions and the process can be monitored since the output is dumped right into the terminal window.

Gelman, Andrew & Donald B. Rubin. 1992. “Inference from Iterative Simulation Using Multiple Sequences.” Statistical Science 7(4):457–472.