In addition to installing the jagsUI
package, we also need to separately install the free JAGS software, which you can download here.
Once that’s installed, load the jagsUI
library:
library(jagsUI)
jagsUI
Workflowlist
We’ll use the longley
dataset to conduct a simple linear regression.
The dataset is built into R.
data(longley)
head(longley)
# GNP.deflator GNP Unemployed Armed.Forces Population Year Employed
# 1947 83.0 234.289 235.6 159.0 107.608 1947 60.323
# 1948 88.5 259.426 232.5 145.6 108.632 1948 61.122
# 1949 88.2 258.054 368.2 161.6 109.773 1949 60.171
# 1950 89.5 284.599 335.1 165.0 110.929 1950 61.187
# 1951 96.2 328.975 209.9 309.9 112.075 1951 63.221
# 1952 98.1 346.999 193.2 359.4 113.270 1952 63.639
We will model the number of people employed (Employed
) as a function of Gross National Product (GNP
).
Each column of data is saved into a separate element of our data list.
Finally, we add a list element for the number of data points n
.
In general, elements in the data list must be numeric, and structured as arrays, matrices, or scalars.
jags_data <- list(
gnp = longley$GNP,
employed = longley$Employed,
n = nrow(longley)
)
Next we’ll describe our model in the BUGS language. See the JAGS manual for detailed information on writing models for JAGS. Note that data you reference in the BUGS model must exactly match the names of the list we just created. There are various ways to save the model file, we’ll save it as a temporary file.
# Create a temporary file
modfile <- tempfile()
#Write model to file
writeLines("
model{
# Likelihood
for (i in 1:n){
# Model data
employed[i] ~ dnorm(mu[i], tau)
# Calculate linear predictor
mu[i] <- alpha + beta*gnp[i]
}
# Priors
alpha ~ dnorm(0, 0.00001)
beta ~ dnorm(0, 0.00001)
sigma ~ dunif(0,1000)
tau <- pow(sigma,-2)
}
", con=modfile)
Initial values can be specified as a list of lists, with one list element per MCMC chain.
Each list element should itself be a named list corresponding to the values we want each parameter initialized at.
We don’t necessarily need to explicitly initialize every parameter.
We can also just set inits = NULL
to allow JAGS to do the initialization automatically, but this will not work for some complex models.
We can also provide a function which generates a list of initial values, which jagsUI
will execute for each MCMC chain.
This is what we’ll do below.
inits <- function(){
list(alpha=rnorm(1,0,1),
beta=rnorm(1,0,1),
sigma=runif(1,0,3)
)
}
Next, we choose which parameters from the model file we want to save posterior distributions for.
We’ll save the parameters for the intercept (alpha
), slope (beta
), and residual standard deviation (sigma
).
params <- c('alpha','beta','sigma')
We’ll run 3 MCMC chains (n.chains = 3
).
JAGS will start each chain by running adaptive iterations, which are used to tune and optimize MCMC performance.
We will manually specify the number of adaptive iterations (n.adapt = 100
).
You can also try n.adapt = NULL
, which will keep running adaptation iterations until JAGS reports adaptation is sufficient.
In general you do not want to skip adaptation.
Next we need to specify how many regular iterations to run in each chain in total.
We’ll set this to 1000 (n.iter = 1000
).
We’ll specify the number of burn-in iterations at 500 (n.burnin = 500
).
Burn-in iterations are discarded, so here we’ll end up with 500 iterations per chain (1000 total - 500 burn-in).
We can also set the thinning rate: with n.thin = 2
we’ll keep only every 2nd iteration.
Thus in total we will have 250 iterations saved per chain ((1000 - 500) / 2).
The optimal MCMC settings will depend on your specific dataset and model.
We’re finally ready to run JAGS, via the jags
function.
We provide our data to the data
argument, initial values function to inits
, our vector of saved parameters to parameters.to.save
, and our model file path to model.file
.
After that we specify the MCMC settings described above.
out <- jags(data = jags_data,
inits = inits,
parameters.to.save = params,
model.file = modfile,
n.chains = 3,
n.adapt = 100,
n.iter = 1000,
n.burnin = 500,
n.thin = 2)
#
# Processing function input.......
#
# Done.
#
# Compiling model graph
# Resolving undeclared variables
# Allocating nodes
# Graph information:
# Observed stochastic nodes: 16
# Unobserved stochastic nodes: 3
# Total graph size: 74
#
# Initializing model
#
# Adaptive phase, 100 iterations x 3 chains
# If no progress bar appears JAGS has decided not to adapt
#
#
# Burn-in phase, 500 iterations x 3 chains
#
#
# Sampling from joint posterior, 500 iterations x 3 chains
#
#
# Calculating statistics.......
#
# Done.
We should see information and progress bars in the console.
If we have a long-running model and a powerful computer, we can tell jagsUI
to run each chain on a separate core in parallel by setting argument parallel = TRUE
:
out <- jags(data = jags_data,
inits = inits,
parameters.to.save = params,
model.file = modfile,
n.chains = 3,
n.adapt = 100,
n.iter = 1000,
n.burnin = 500,
n.thin = 2,
parallel = TRUE)
While this is usually faster, we won’t be able to see progress bars when JAGS runs in parallel.
Our first step is to look at the output object out
:
out
# JAGS output for model '/tmp/RtmpDGbnas/file105b448763dc8', generated by jagsUI.
# Estimates based on 3 chains of 1000 iterations,
# adaptation = 100 iterations (sufficient),
# burn-in = 500 iterations and thin rate = 2,
# yielding 750 total samples from the joint posterior.
# MCMC ran for 0.001 minutes at time 2024-01-23 14:35:44.873617.
#
# mean sd 2.5% 50% 97.5% overlap0 f Rhat n.eff
# alpha 51.851 0.802 50.269 51.897 53.392 FALSE 1 1.006 354
# beta 0.035 0.002 0.031 0.035 0.039 FALSE 1 1.010 281
# sigma 0.736 0.174 0.494 0.708 1.165 FALSE 1 1.007 750
# deviance 33.632 3.258 30.073 32.767 42.275 FALSE 1 1.020 302
#
# Successful convergence based on Rhat values (all < 1.1).
# Rhat is the potential scale reduction factor (at convergence, Rhat=1).
# For each parameter, n.eff is a crude measure of effective sample size.
#
# overlap0 checks if 0 falls in the parameter's 95% credible interval.
# f is the proportion of the posterior with the same sign as the mean;
# i.e., our confidence that the parameter is positive or negative.
#
# DIC info: (pD = var(deviance)/2)
# pD = 5.3 and DIC = 38.917
# DIC is an estimate of expected predictive error (lower is better).
We first get some information about the MCMC run.
Next we see a table of summary statistics for each saved parameter, including the mean, median, and 95% credible intervals.
The overlap0
column indicates if the 95% credible interval overlaps 0, and the f
column is the proportion of posterior samples with the same sign as the mean.
The out
object is a list
with many components:
names(out)
# [1] "sims.list" "mean" "sd" "q2.5" "q25"
# [6] "q50" "q75" "q97.5" "overlap0" "f"
# [11] "Rhat" "n.eff" "pD" "DIC" "summary"
# [16] "samples" "modfile" "model" "parameters" "mcmc.info"
# [21] "run.date" "parallel" "bugs.format" "calc.DIC"
We’ll describe some of these below.
We should pay special attention to the Rhat
and n.eff
columns in the output summary, which are MCMC diagnostics.
The Rhat
(Gelman-Rubin diagnostic) values for each parameter should be close to 1 (typically, < 1.1) if the chains have converged for that parameter.
The n.eff
value is the effective MCMC sample size and should ideally be close to the number of saved iterations across all chains (here 750, 3 chains * 250 samples per chain).
In this case, both diagnostics look good.
We can also visually assess convergence using the traceplot
function:
traceplot(out)
We should see the lines for each chain overlapping and not trending up or down.
We can quickly visualize the posterior distributions of each parameter using the densityplot
function:
densityplot(out)
The traceplots and posteriors can be plotted together using plot
:
plot(out)
We can also generate a posterior plot manually.
To do this we’ll need to extract the actual posterior samples for a parameter.
These are contained in the sims.list
element of out
.
post_alpha <- out$sims.list$alpha
hist(post_alpha, xlab="Value", main = "alpha posterior")
If we need more iterations or want to save different parameters, we can use update
:
# Now save mu also
params <- c(params, "mu")
out2 <- update(out, n.iter=300, parameters.to.save = params)
# Compiling model graph
# Resolving undeclared variables
# Allocating nodes
# Graph information:
# Observed stochastic nodes: 16
# Unobserved stochastic nodes: 3
# Total graph size: 74
#
# Initializing model
#
# Adaptive phase.....
# Adaptive phase complete
#
# No burn-in specified
#
# Sampling from joint posterior, 300 iterations x 3 chains
#
#
# Calculating statistics.......
#
# Done.
The mu
parameter is now in the output:
out2
# JAGS output for model '/tmp/RtmpDGbnas/file105b448763dc8', generated by jagsUI.
# Estimates based on 3 chains of 1300 iterations,
# adaptation = 100 iterations (sufficient),
# burn-in = 1000 iterations and thin rate = 2,
# yielding 450 total samples from the joint posterior.
# MCMC ran for 0 minutes at time 2024-01-23 14:35:45.449282.
#
# mean sd 2.5% 50% 97.5% overlap0 f Rhat n.eff
# alpha 51.829 0.767 50.398 51.812 53.247 FALSE 1 1.012 133
# beta 0.035 0.002 0.031 0.035 0.038 FALSE 1 1.010 154
# sigma 0.732 0.164 0.495 0.705 1.138 FALSE 1 0.999 450
# mu[1] 59.981 0.340 59.361 59.986 60.663 FALSE 1 1.013 127
# mu[2] 60.856 0.301 60.299 60.857 61.457 FALSE 1 1.013 129
# mu[3] 60.808 0.303 60.244 60.810 61.415 FALSE 1 1.013 129
# mu[4] 61.732 0.264 61.243 61.734 62.271 FALSE 1 1.012 134
# mu[5] 63.276 0.213 62.882 63.268 63.720 FALSE 1 1.009 165
# mu[6] 63.903 0.200 63.525 63.901 64.306 FALSE 1 1.008 195
# mu[7] 64.543 0.192 64.156 64.545 64.923 FALSE 1 1.006 248
# mu[8] 64.464 0.193 64.083 64.467 64.844 FALSE 1 1.006 240
# mu[9] 65.660 0.194 65.229 65.650 66.044 FALSE 1 1.003 410
# mu[10] 66.415 0.207 65.974 66.420 66.826 FALSE 1 1.003 450
# mu[11] 67.236 0.229 66.757 67.256 67.681 FALSE 1 1.002 450
# mu[12] 67.298 0.231 66.812 67.318 67.751 FALSE 1 1.002 450
# mu[13] 68.626 0.281 68.021 68.644 69.169 FALSE 1 1.003 409
# mu[14] 69.318 0.311 68.655 69.338 69.914 FALSE 1 1.004 359
# mu[15] 69.860 0.335 69.155 69.881 70.497 FALSE 1 1.004 328
# mu[16] 71.138 0.397 70.337 71.161 71.898 FALSE 1 1.005 279
# deviance 33.587 3.186 30.152 32.815 42.483 FALSE 1 1.002 450
#
# Successful convergence based on Rhat values (all < 1.1).
# Rhat is the potential scale reduction factor (at convergence, Rhat=1).
# For each parameter, n.eff is a crude measure of effective sample size.
#
# overlap0 checks if 0 falls in the parameter's 95% credible interval.
# f is the proportion of the posterior with the same sign as the mean;
# i.e., our confidence that the parameter is positive or negative.
#
# DIC info: (pD = var(deviance)/2)
# pD = 5.1 and DIC = 38.68
# DIC is an estimate of expected predictive error (lower is better).
This is a good opportunity to show the whiskerplot
function, which plots the mean and 95% CI of parameters in the jagsUI
output:
whiskerplot(out2, 'mu')