Passing command line arguments into Stan model code

Overall question: Is there a way to pass an argument from the command line into .stan model code, to alter a function that’s being used in the transformed parameters block? I don’t think I’m asking this correctly, but hopefully the below background helps get at what I’m asking.


I’m running a bunch of models that are essentially the same, except they each use a different function in the transformed parameters block. Currently, I have a different .stan file for each, where the function logistic_gauge would be replaced by one of five other functions, depending on the model:

#include fcns_data_params_marg.stan
transformed parameters {
  vector[n0] beta;
  for (n in 1:n0) {
    beta[n] = logistic_gauge(W_trunc[n], dep);
  }
}
#include model_gq_marg.stan

(For reference, I’ve included the code for the #include statements at the end of this post.)

I kick off each of these models from the command line with the following code:

#!/bin/zsh
#
trap '' HUP
stanc_exe="/opt/homebrew/Caskroom/miniconda/base/envs/stan/bin/cmdstan/bin/stanc"
# compile model and link c++ 
inc_path="stan/trunc/"
for gauge_name in "gauss" "logistic" "inv_log" "asym_log" "dirichlet" "rectangular"
do
object="stan/trunc/bivar_${gauge_name}_${threshold}"
${stanc_exe} ${object}.stan --include-paths=${inc_path}
cmdstan_model ${object}
for level in "low" "mid" "high"
do
for i in {1..100}
do
export gauge_name level i
nohup ./shell_scripts/sampling_trunc_calibrate.sh > shell_scripts/console_output/calibrate/${gauge_name}_${level}_${i}.txt 2>&1&
sleep 10
done
sleep 10
done
sleep 10
done

Where sampling_trunc_calibrate.sh is:

#!/bin/zsh
# model run 
trap '' HUP
datafile="../data/${gauge_name}/${level}_${i}.json"
basedir="./stan/"
cd ${basedir}
model="trunc/bivar_${gauge_name}"
outbase="csv_fits/calibrate/${gauge_name}/${level}_${i}_trunc"

# run model with 3 chains
./${model} sample num_chains=3 \
                  data file=${datafile} \
                  output file=${outbase}.csv \
                  num_threads=3

echo "Model has finished running all 3 chains"

Is there a way for me to have one .stan file (instead of six) and pass in the gauge_name argument from the command line to alter the function that’s used in the transformed parameters block? I can’t think of an obvious way to do this, which is why I currently have six different .stan files.

Thank you in advance for your help!! I really appreciate it.


Code for #include statements:
fcns_data_params_marg.stan file:

// shared functions, data input values and shared parameter declaration
functions {
  #include ../gauge_fcns.stanfunctions
  #include truncGamma.stanfunctions
}

data {
  int<lower=1> N;
  int<lower=1> n0;
  array[N] real<lower=0> R;
  array[N] real<lower=0, upper=1> W;
  array[N] real<lower=0> r0_w;
  array[n0] int<lower=1> idx;
}

transformed data {
  array[n0] real<lower=0> R_trunc = R[idx];
  array[n0] real<lower=0, upper=1> W_trunc = W[idx];
  array[n0] real<lower=0> r0_w_trunc = r0_w[idx];
}

parameters {
  real<lower=0> alpha;
  real<lower=0, upper =1> dep;
}

model_gq_marg.stan file:

// same model and gen_quant declarations across differing gauge functions
model {
  // alpha ~ student_t(4, 0, 2.5); 
  alpha ~ gamma(4, 2);
  dep ~ uniform(0, 1);
  
  for (n in 1:n0) {
    target += trunc_gamma_lpdf(R_trunc[n] | r0_w_trunc[n], alpha, beta[n]);
  }
}

generated quantities {
  vector[n0] log_lik;
  for (n in 1:n0) {
    log_lik[n] = trunc_gamma_lpdf(R_trunc[n] | r0_w_trunc[n], alpha, beta[n]);
  }
}

I can upload the custom functions if that would be helpful, but I didn’t want to make this post longer than it already is. Thank you for taking the time to read through this!

The most common way to do this is to add an item to the data block int<lower=0, upper=6> gauge_choice and then have a bunch of if/else statements in your transformed data block to select the function. Packages like rstanarm use this extensively

I did consider doing something similar to this, however that requires updating the datafile to correspond with what gauge function I’m using. I’m ultimately fitting these 6 models on the same datasets (100 total), so while it’s not impossible to modify the data files to update the gauge_choice depending on the model I’m running, it would be messier (imo) than having separate model files, as I currently do.

Which is why I was wondering if there was some way for me to incorporate a similar strategy to what you suggest, but by reading an argument from the command line instead of changing the data file.

Unless - is there a way to not modify the saved json file but “append” the appropriate gauge_choice indicator when the datafile is being read by the model (from a zsh script)?

I recognize this is really just a user experience/preference issue, so I completely understand if the best solution is to just stick with what I’m doing. But I appreciate your insight and help!

You could probably use a command line tool like jq to do that. I think it would still require at least one “scratch” file you’re writing a copy of the data to