Sartini-Stats - STAN Introduction

What is STAN?

Bayesian probabilistic programming language
Multiple posterior sampling routines
- Hamiltonian Monte Carlo
- Variational Inference
- Laplace approximation
Based on C++
Interfaces with Python, Julia, R, and Unix Shell

Structure of a STAN Script

functions {
  // ... function declarations and definitions ...
}
data {
  // ... declarations ...
}
transformed data {
   // ... declarations ... statements ...
}
parameters {
   // ... declarations ...
}
transformed parameters {
   // ... declarations ... statements ...
}
model {
   // ... declarations ... statements ...
}
generated quantities {
   // ... declarations ... statements ...
}

Section - functions

Complex indexing
- Sparsely observed data
Generating quantities/structures
- Splines, etc.
Suffixes for particular functions
- Containing RNG: “_rng”
- Modifying target density: “_lp”

Section - data

Likelihood data
- Indexing arrays
All constants
- Array extents
Commonly used linear transforms

Section - transformed data

Functions of data variables
Only evaluated once
- Prior to sampling
Helpful for book-keeping
- Simplify data inputs
Random data sub-samples

Section - parameters

Specify sampled quantities
- Variable names
- Extents
Definitions only, no statements
Read from underlying sampler
Can provide initial values

Section - transformed parameters

Deterministic functions
Part of target posterior
Evaluated with each sample
- Inverse transform
- Log absolute Jacobian
Good for re-parameterization
- Stability
- Latent modeling

Section - model

Define the target posterior
- Sum of log density functions
Prior distributions on (transformed) parameters
Data/model likelihood
Most computational expense
ORDER MATTERS

Section - generated quantities

Executed after samples are generated
Functions of model output
- Predictions for new data
- Simulate new data
- Extract posterior estimates
- Calculate model fit criterion

Example Model - GLM

data {
  int N;  // Number of observations
  int P;  // Number of fixed effect covariates
  
  array[N] int<lower=0, upper=1> Y;  // Binary outcomes
  matrix[N, P] X;                    // Fixed effects design matrix
}
parameters {
   vector[P] beta;  // Coefficients
}
model {
   Y ~ bernoulli_logit(X * beta);
}

Running the Model in R

fit_df = mtcars %>%
  mutate(Efficient = case_when(mpg >= median(mpg) ~ 1,
                               TRUE ~ 0)) %>%
  mutate(am = as.factor(am))
fit_matrix = model.matrix(~cyl + disp + hp + drat + wt + am, fit_df)

data_list = list(N = nrow(mtcars), P = ncol(fit_matrix), 
                 Y = fit_df$Efficient, X = fit_matrix)

model = sampling(
  first_model, 
  data = data_list, 
  chains = 4, 
  iter = 1000, 
  warmup = 500, 
  # init = ,
  # control = list(adapt_delta = , 
  #                max_treedepth = , 
  #                stepsize_jitter = ), 
  verbose = F,
  refresh = 0
)

Convergence Monitoring

check_hmc_diagnostics(model)


Divergences:

0 of 2000 iterations ended with a divergence.


Tree depth:

1212 of 2000 iterations saturated the maximum tree depth of 10 (60.6%).
Try increasing 'max_treedepth' to avoid saturation.


Energy:

E-BFMI indicated no pathological behavior.

summary(model)$summary[,"Rhat"]

 beta[1]  beta[2]  beta[3]  beta[4]  beta[5]  beta[6]  beta[7]     lp__ 
2.365263 2.081568 2.424803 2.459833 2.446590 1.643589 2.438779 1.039459

Hamiltonian Monte Carlo

HMC Visualization: By Justinkunimune - github.com/jkunimune/hamiltonian-mc, CC0

Hamiltonian Monte Carlo Continued

Version of Metropolis-Hastings
Hamiltonian Dynamics used to propose next state
- Trajectory with momentum
- Distribution \(\approx\) potential energy field
- Leapfrog integrator stepwise approximation
Momentum: reduced correlation between samples
Energy conservation: high acceptance probability

Divergences

Simulated trajectory \(\neq\) true trajectory
Global step size \(>\) true posterior geometry resolution
- Leapfrog first order approximataion
Hamiltonian departs from initial value
- Total energy (kinetic + potential)
- Should be preserved along trajectory
Sampler WILL NOT accept samples after divergence

Tree Depth Warnings

Tree depth controls number of simulation steps
- \(\leq 2^{max\_treedepth}\) steps
Primarily an efficiency concern
Generally recommended to not increase
- Often model misspecification

Est. Bayesian Fraction of Miss. Info.

Posterior decomposes into energy equivalence classes
Low EBFMI indicates getting “stuck” in energy sets
- STAN monitors energy during sampling
Insufficiently exploring the posterior
- Tails too large, etc

Geometric Intuition

Return to our Example: Model Outputs

samples = extract(model)

beta_0 = map(1:dim(samples$beta)[1], function(x){
  return(data.frame(beta0 = samples$beta[x,1], 
                    Sample = x))
}) %>% list_rbind()

Model Outputs (2)

samples = extract(model, "beta", permuted = FALSE)

beta_0 = map(1:dim(samples)[1], function(x){
  return(data.frame(beta0 = samples[x,,1], 
                    Chain = 1:4, 
                    sample = x))
}) %>% list_rbind()

ShinySTAN Debugging

launch_shinystan(model)

ShinySTAN - NUTS Summary

ShinySTAN - Divergences

ShinySTAN - Treedepth

ShinySTAN - Energy

ShinySTAN - Autocorrelation

How to Update the Model

data {
  int N;  // Number of observations
  int P;  // Number of fixed effect covariates
  
  array[N] int<lower=0, upper=1> Y;  // Binary outcomes
  matrix[N, P] X;                    // Fixed effects design matrix
}
transformed data {
  matrix[N, P] Q_coef = qr_thin_Q(X) * sqrt(N-1);
  matrix[P, P] R_coef = qr_thin_R(X) / sqrt(N-1);
  matrix[P, P] R_coef_inv = inverse(R_coef);
}
parameters {
  vector[P] theta;  // Coefficients
}
model {
  theta ~ normal(0, 100);
  Y ~ bernoulli_logit(Q_coef * theta);
}
generated quantities {
  vector[P] beta = R_coef_inv * theta;
}

Running the Updated Model

model = sampling(
  second_model, 
  data = data_list, 
  chains = 4, 
  iter = 2500, 
  warmup = 1000, 
  control = list(adapt_delta = 0.95),
  verbose = F,
  refresh = 0
)

Updated Model Performance

check_hmc_diagnostics(model)


Divergences:

0 of 6000 iterations ended with a divergence.


Tree depth:

0 of 6000 iterations saturated the maximum tree depth of 10.


Energy:

E-BFMI indicated no pathological behavior.

summary(model)$summary[,"Rhat"]

theta[1] theta[2] theta[3] theta[4] theta[5] theta[6] theta[7]  beta[1] 
1.003796 1.003760 1.001931 1.003562 1.003605 1.003334 1.003570 1.002406 
 beta[2]  beta[3]  beta[4]  beta[5]  beta[6]  beta[7]     lp__ 
1.001364 1.003667 1.003591 1.003552 1.000757 1.003570 1.001324

Updated Model Visualization

Updated Geometry

Profiling the Model with CmdStanR (1)

data {
  int N;  // Number of observations
  int P;  // Number of fixed effect covariates
  
  array[N] int<lower=0, upper=1> Y;  // Binary outcomes
  matrix[N, P] X;                    // Fixed effects design matrix
}
transformed data {
  matrix[N, P] Q_coef = qr_thin_Q(X) * sqrt(N-1);
  matrix[P, P] R_coef = qr_thin_R(X) / sqrt(N-1);
  matrix[P, P] R_coef_inv = inverse(R_coef);
}
parameters {
  vector[P] theta;  // Coefficients
}
model {
  profile("Priors") {
    target += normal_lpdf(theta| 0, 100);
  }
  profile("Likelihood") {
    target += bernoulli_logit_lpmf(Y| Q_coef * theta);
  }
}
generated quantities {
  profile("Generated") {
    vector[P] beta = R_coef_inv * theta;
  }
}

Profiling the Model with CmdStanR (2)

model = cmdstan_model("Profile_Mod.stan")
fit = model$sample(data = data_list, chains = 1)
fit$profiles()[[1]][,c(1,3,4,5,8)]

        name  total_time forward_time reverse_time autodiff_calls
1  Generated 0.000175152  0.000175152   0.00000000              0
2 Likelihood 0.145845000  0.116126000   0.02971880         170759
3     Priors 0.026700700  0.023262300   0.00343839         170759

STAN Introduction

What is STAN?

Structure of a STAN Script

Section - functions

Section - data

Section - transformed data

Section - parameters

Section - transformed parameters

Section - model

Section - generated quantities

Example Model - GLM

Running the Model in R

Convergence Monitoring

Hamiltonian Monte Carlo

Hamiltonian Monte Carlo Continued

Divergences

Tree Depth Warnings

Est. Bayesian Fraction of Miss. Info.

Geometric Intuition

Return to our Example: Model Outputs

Model Outputs (2)

ShinySTAN Debugging

ShinySTAN - NUTS Summary

ShinySTAN - Divergences

ShinySTAN - Treedepth

ShinySTAN - Energy

ShinySTAN - Autocorrelation

How to Update the Model

Running the Updated Model

Updated Model Performance

Updated Model Visualization

Updated Geometry

Profiling the Model with CmdStanR (1)

Profiling the Model with CmdStanR (2)

Resource Links