Lack of match between synthetic data and Stan posteriors

Hello,

I’m currently working on testing a hierarchical logistic regression model by using a synthetic data. However, I’ve noticed a discrepancy between the stan posteriors and the synthetic data.

I have implemented the hierarchical logistic regression model in Stan using the provided code.

functions {
  int[] sequence(int start, int end) { 
    int seq[end - start + 1];
    for (n in 1:num_elements(seq)) {
      seq[n] = n + start - 1;
    }
    return seq; 
  } 

  real partial_log_lik_lpmf(int[] seq, int start, int end,
		     data int ncat, data int nvar, data int ntotvar, data int[] Y, data matrix x1, data matrix x2, data matrix x3, data matrix x4, data int[] J_1, data int[] J_2, 
                     vector b_a2, vector b_a3, vector b_a4, vector b_bc, vector b_b1tt, vector b_b2tt, vector b_b3tt, vector b_b4tt, matrix avail) {
                     
    real ptarget = 0;
    int N = end - start + 1;
    // Define matrices/vector for x, alpha, beta
    matrix[ncat,nvar] x;
    vector[ncat] alpha;
    matrix[nvar,ncat] beta;

    array[N] row_vector[ncat] a = rep_array(rep_row_vector(0,ncat), N);
      
    // initialize linear predictor term
    // Terms are btc, b1tt, b2tt, b3tt, b4tt

    array[N] row_vector[ntotvar] b = rep_array(rep_row_vector(0,ntotvar), N);

    for (n in 1:N) {
      int nn = n + start - 1;
      // add to linear predictor term
      a[n] += [0,b_a2[J_2[nn]], b_a3[J_2[nn]], b_a4[J_2[nn]]];
      // add to linear predictor term
      // Terms are btc (0), b1tt, b2tt, b3tt, b4tt
      b[n] += [b_bc[J_1[nn]], b_b1tt[J_1[nn]], b_b2tt[J_1[nn]], b_b3tt[J_1[nn]], b_b4tt[J_1[nn]]];
      // Each x and beta is a matrix with dimensions alts x variables
      // Our x will be the time/cost coming in as inputs
      x = to_matrix([x1[nn],x2[nn],x3[nn],x4[nn]]);

      // Our betas will be the hierarchical slope parameters (2 x 4)
      beta = -1*[b[n,1] * b[n,2:5], [b[n,1],0,0,b[n,1]]];
      // Our alphas will be the hierarchical intercept parameters. ln(avail[nn,i]] gives an availability measure
      alpha = [a[n,1]+log(avail[nn,1]), b[n,1] * a[n,2]+log(avail[nn,2]), b[n,1] * a[n,3]+log(avail[nn,3]), b[n,1] * a[n,4]+log(avail[nn,4])]';
      ptarget += categorical_logit_glm_lupmf(Y[nn] | x, alpha, beta);
    }
    return ptarget;
  }
}

data {
  int<lower=1> N;  // total number of observations
  int<lower=2> ncat;  // number of categories
  int<lower=2> nvar;  // number of variables for each alternative
  int<lower=2> ntotvar;  // number of total variables
  int grainsize;  // grainsize for threading
  array[N] int Y;  // response variable
  // covariate matrix for alt1
  matrix[N,nvar] x1; // matrix[N,nvar] x1;
  // covariate matrix for alt2 
  matrix[N,nvar] x2; // matrix[N,nvar] x2;
  // covariate matrix for alt3 
  matrix[N,nvar] x3; // matrix[N,nvar] x3;
  // covariate matrix for alt4
  matrix[N,nvar] x4; // matrix[N,nvar] x4;
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  array[N] int<lower=1> J_1;  // grouping indicator per observation
  // group-level predictor values
  // data for group-level effects of ID 2
  int<lower=1> N_2;  // number of grouping levels
  int<lower=1> M_2;  // number of coefficients per level
  array[N] int<lower=1> J_2;  // grouping indicator per observation
  matrix<lower=0,upper=1>[N,ncat] avail; // whether an alternative is available
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  int seq[N] = sequence(1, N);
}
parameters {
  real<lower=0> mu_b1tt; // population-level effects
  real<lower=0> mu_b2tt; // population-level effects
  real<lower=0> mu_b3tt; // population-level effects
  real<lower=0> mu_b4tt; // population-level effects
  real mu_a2; // population-level effects
  real mu_a3; // population-level effects
  real mu_a4; // population-level effects
  real<lower=0> sd1_bc;  // group-level standard deviations
  real<lower=0> sd1_b1tt;  // group-level standard deviations
  real<lower=0> sd1_b2tt;  // group-level standard deviations
  real<lower=0> sd1_b3tt;  // group-level standard deviations
  real<lower=0> sd1_b4tt;  // group-level standard deviations
  real<lower=0> sd2_a2;  // group-level standard deviations
  real<lower=0> sd2_a3;  // group-level standard deviations
  real<lower=0> sd2_a4;  // group-level standard deviations
  vector<offset=0, multiplier = sd1_bc>[N_1] log_b_bc;
  vector<offset=mu_b1tt, multiplier = sd1_b1tt>[N_1] b_b1tt;
  vector<offset=mu_b2tt, multiplier = sd1_b2tt>[N_1] b_b2tt;
  vector<offset=mu_b3tt, multiplier = sd1_b3tt>[N_1] b_b3tt;
  vector<offset=mu_b4tt, multiplier = sd1_b4tt>[N_1] b_b4tt;
  vector<offset=mu_a2, multiplier = sd2_a2>[N_2] b_a2;
  vector<offset=mu_a3, multiplier = sd2_a3>[N_2] b_a3;
  vector<offset=mu_a4, multiplier = sd2_a4>[N_2] b_a4;
}
transformed parameters{
	vector<lower=0>[N_1] b_bc = exp(log_b_bc);
}

model {
  // likelihood including constants
  if (!prior_only) {
       target += reduce_sum(partial_log_lik_lpmf, seq,
                     grainsize, ncat, nvar, ntotvar, Y, x1, x2, x3, x4, J_1, J_2,
		        b_a2, b_a3, b_a4, b_bc, b_b1tt, b_b2tt, b_b3tt, b_b4tt, avail);

  }
  // priors
  mu_b1tt ~ normal(3,0.25);
  mu_b2tt ~ normal(3,0.25);
  mu_b3tt ~ normal(3,0.25);
  mu_b4tt ~ normal(3,0.25);
  mu_a2 ~ normal(-1,0.25);
  mu_a3 ~ normal(-1,0.25);
  mu_a4 ~ normal(-1,0.25);
  sd1_bc ~ student_t(3,0,1);
  sd1_b1tt ~ student_t(3,0,1);
  sd1_b2tt ~ student_t(3,0,1);
  sd1_b3tt ~ student_t(3,0,1);
  sd1_b4tt ~ student_t(3,0,1);
  sd2_a2 ~ student_t(3,0,1);
  sd2_a3 ~ student_t(3,0,1);
  sd2_a4 ~ student_t(3,0,1);
  b_a2 ~ normal(mu_a2, sd2_a2);
  b_a3 ~ normal(mu_a3, sd2_a3);
  b_a4 ~ normal(mu_a4, sd2_a4);
  log_b_bc ~ normal(0, sd1_bc);
  b_b1tt ~ gamma(mu_b1tt, sd1_b1tt);
  b_b2tt ~ gamma(mu_b2tt, sd1_b2tt);
  b_b3tt ~ gamma(mu_b3tt, sd1_b3tt);
  b_b4tt ~ gamma(mu_b4tt, sd1_b4tt);
}

Additionally, I have generated the synthetic data using an R script.

rm(list = ls())


# install.packages("brms", repos='http://cran.us.r-project.org')
library(tidyverse)
library(brms)

library(cmdstanr) # observe startup messages
library(posterior)
library(bayesplot)

set.seed(1232)

n <- 2000 # number of observations

travelTimeDrive <- abs(rnorm(n, 13, 13))
travelCostDrive <- abs(rnorm(n, 136, 202))
travelTimeWalk <- abs(rnorm(n, 123, 197))
travelTimeBike <- abs(rnorm(n, 31, 50))
travelTimeTransit <- abs(rnorm(n, 45, 25))
travelCostTransit <- abs(rnorm(n, 212, 128))
av_car <- rep(1, n)
av_walk <- rep(1, n)
av_bike <- rep(1, n)
av_transit <- rep(1, n)

trippurp <- round(pmax(pmin(rnorm(n, mean=2.77, sd=1.24), 5), 1))


#Creating personid column
# Create a vector of numbers between 1000 and 2000, shuffle it, and select the first max_rows/2 elements
personid <- sample(seq(1000, 2000), size = n/2, replace = TRUE)
personid <- sample(personid, size = n/2)

# Duplicate the personid vector to fill the column and add 0.1 or 0.2 randomly to each ID
personid <- rep(personid, each = 2)
personid <- personid + sample(c(0.1, 0.2), size = n, replace = TRUE)


# create data frame
fake_data <- data.frame(
  travelTimeDrive = travelTimeDrive,
  travelCostDrive = travelCostDrive,
  travelTimeWalk = travelTimeWalk,
  travelTimeBike = travelTimeBike,
  travelTimeTransit = travelTimeTransit,
  travelCostTransit = travelCostTransit,
  av_car = av_car,
  av_walk = av_walk,
  av_bike = av_bike,
  av_transit = av_transit,
  personid = personid,
  trippurp = trippurp
)

fake_data$av_walk <- ifelse(fake_data$travelTimeDrive < 6, 1, fake_data$av_walk)


# Load the package
library(writexl)


n <- nrow(fake_data)

trippurp <- fake_data$trippurp
travelTimeWalk <- fake_data$travelTimeWalk
travelTimeBike <- fake_data$travelTimeBike
travelTimeDrive <- fake_data$travelTimeDrive
travelTimeTransit <- fake_data$travelTimeTransit
travelCostDrive <- fake_data$travelCostDrive
travelCostTransit <- fake_data$travelCostTransit

# define alpha parameters

alpha_2 <- numeric(n)
sd2_a2 <- numeric(n)
mu_a2 <- rnorm(1, -1, 0.25)
sd2_a2 <- abs(rt(1, df = 3, ncp = 0) * 1)
for (i in 1:5) {
  alpha_2[trippurp == i] <- rnorm(1, mu_a2, sd2_a2)
  alpha_2[trippurp == i & n >= 1] <- rep(alpha_2[trippurp == i], sum(trippurp == i))
}

alpha_3 <- numeric(n)
sd2_a3 <- numeric(n)
mu_a3 <- rnorm(1, -1, 0.25)
sd2_a3 <- abs(rt(1, df = 3, ncp = 0) * 1)
for (i in 1:5) {
  alpha_3[trippurp == i] <- rnorm(1, mu_a3, sd2_a3)
  alpha_3[trippurp == i & n >= 1] <- rep(alpha_3[trippurp == i], sum(trippurp == i))
}

alpha_4 <- numeric(n)
sd2_a4 <- numeric(n)
mu_a4 <- rnorm(1, -1, 0.25)
sd2_a4 <- abs(rt(1, df = 3, ncp = 0) * 1)
for (i in 1:5) {
  alpha_4[trippurp == i] <- rnorm(1, mu_a4, sd2_a4)
  alpha_4[trippurp == i & n >= 1] <- rep(alpha_4[trippurp == i], sum(trippurp == i))
}

# define b_cost parameter
sd1_bc <- numeric(n)
log_b_cost <- numeric(n)
sd1_bc <- abs(rt(1, df = 3, ncp = 0) * 1)
for (i in 1:n) {
  log_b_cost[i] <- rnorm(1, 0, sd1_bc)
}

b_cost <- exp(log_b_cost)

# define b_tt parameters
sd1_b1tt <- numeric(n)
b_tt_1 <- numeric(n)
mu_b1tt <- numeric(n)
mu_b1tt <- abs(rnorm(1, 3, 0.25))
sd1_b1tt <- abs(rt(1, df = 3, ncp = 0) * 1)
for (i in 1:n) {
  b_tt_1[i] <- rgamma(1, mu_b1tt, sd1_b1tt) # Generate b_tt_1 from gamma(shape, rate) distribution
}

mu_b2tt <- numeric(n)
sd1_b2tt <- numeric(n)
b_tt_2 <- numeric(n)
mu_b2tt <- abs(rnorm(1, 3, 0.25))
sd1_b2tt <- abs(rt(1, df = 3, ncp = 0) * 1)
for (i in 1:n) {
  b_tt_2[i] <- rgamma(1, mu_b2tt, sd1_b2tt) # Generate b_tt_2 from gamma(shape, rate) distribution
}

mu_b3tt <- numeric(n)
sd1_b3tt <- numeric(n)
b_tt_3 <- numeric(n)
mu_b3tt <- abs(rnorm(1, 3, 0.25))
sd1_b3tt <- abs(rt(1, df = 3, ncp = 0) * 1)
for (i in 1:n) {
  b_tt_3[i] <- rgamma(1, mu_b3tt, sd1_b3tt) # Generate b_tt_3 from gamma(shape, rate) distribution
}

mu_b4tt <- numeric(n)
sd1_b4tt <- numeric(n)
b_tt_4 <- numeric(n)
mu_b4tt <- abs(rnorm(1, 3, 0.25))
sd1_b4tt <- abs(rt(1, df = 3, ncp = 0) * 1)
for (i in 1:n) {
  b_tt_4[i] <- rgamma(1, mu_b4tt, sd1_b4tt) # Generate b_tt_4 from gamma(shape, rate) distribution
}


travelCostDrive1 <- travelCostDrive / 100
travelCostTransit1 <- travelCostTransit / 100
travelTimeDrive1 <- travelTimeDrive / 60
travelTimeWalk1 <- travelTimeWalk / 60
travelTimeBike1 <- travelTimeBike / 60
travelTimeTransit1 <- travelTimeTransit / 60

# Utility functions
V1 <- -1 * b_cost * (b_tt_1 * travelTimeDrive1 + travelCostDrive1)
V2 <- alpha_2 + -1 * b_cost * (b_tt_2 * travelTimeWalk1)
V3 <- alpha_3 + -1 * b_cost * (b_tt_3 * travelTimeBike1)
V4 <- alpha_4 + -1 * b_cost * (b_tt_4 * travelTimeTransit1 + travelCostTransit1)

# # Normalize the utilities
max_utilities <- apply(cbind(V1, V2, V3, V4), 1, max)
V1_norm <- V1 - max_utilities
V2_norm <- V2 - max_utilities
V3_norm <- V3 - max_utilities
V4_norm <- V4 - max_utilities


p1 <- exp(V1_norm) / rowSums(exp(cbind(V1_norm, V2_norm, V3_norm, V4_norm)))
p2 <- exp(V2_norm) / rowSums(exp(cbind(V1_norm, V2_norm, V3_norm, V4_norm)))
p3 <- exp(V3_norm) / rowSums(exp(cbind(V1_norm, V2_norm, V3_norm, V4_norm)))
p4 <- exp(V4_norm) / rowSums(exp(cbind(V1_norm, V2_norm, V3_norm, V4_norm)))


# initialize trptrans vector
trptrans <- rep(NA, nrow(fake_data))

# generate a vector of random numbers
ra <- runif(nrow(fake_data))

# loop over all rows of fake_data
for (i in 1:nrow(fake_data)) {
  # find the index of the first element in the cumulative sum of probabilities that is greater than ra[i]
  trptrans[i] <- min(which(cumsum(c(p1[i], p2[i], p3[i], p4[i])) > ra[i]))
}

# add the trptrans vector to the data as a new column
fake_data$trptrans <- trptrans


# Create a data frame containing your dataset
my_data <- data.frame(mu_b1tt, mu_b2tt, mu_b3tt, mu_b4tt, mu_a2, mu_a3, mu_a4, sd1_bc, sd1_b1tt, sd1_b2tt, sd1_b3tt, sd1_b4tt, sd2_a2, sd2_a3, sd2_a4)

# Extract the columns you want to use into a separate data frame
result_vector <- my_data[, c("mu_b1tt", "mu_b2tt", "mu_b3tt", "mu_b4tt", "mu_a2", "mu_a3", "mu_a4", "sd1_bc", "sd1_b1tt", "sd1_b2tt", "sd1_b3tt", "sd1_b4tt", "sd2_a2", "sd2_a3", "sd2_a4")]

# Calculate the means, medians, standard deviations and median absolute deviations
means <- sapply(result_vector, mean)
medians <- sapply(result_vector, median)
sds <- sapply(result_vector, sd)
mads <- sapply(result_vector, mad)

# Create a new data frame with the desired format
df <- data.frame(Mean = round(means,2), Median = round(medians,2), SD = round(sds,2), MAD = round(mads,2), row.names = c("mu_b1tt", "mu_b2tt", "mu_b3tt", "mu_b4tt", "mu_a2", "mu_a3", "mu_a4", "sd1_bc", "sd1_b1tt", "sd1_b2tt", "sd1_b3tt", "sd1_b4tt", "sd2_a2", "sd2_a3", "sd2_a4"))

# Print the new data frame
print(df)



# Mutate costs/times to be negative and in $/hours. Income in $100,000
mod = fake_data%>%mutate(travelCostDrive=travelCostDrive/100, travelCostTransit=travelCostTransit/100, travelTimeDrive=travelTimeDrive/60, travelTimeWalk=travelTimeWalk/60,travelTimeBike=travelTimeBike/60,
                         travelTimeTransit=travelTimeTransit/60,hhfaminc=hhfaminc/10^5)
mod = mod[order(mod$personid),]

mod$personid = cumsum(c(0, diff(mod$personid)) != 0) + 1

model.1 = cmdstan_model("mtc_equity_23.stan", cpp_options = list(stan_opencl = TRUE, stan_threads = TRUE), stanc_options = list("O1"))

samp_ct = nrow(mod)

data_list = list(N = samp_ct,
                 ncat = 4L,
                 nvar = 2L,
                 ntotvar = 5L,
                 grainsize = 1,
                 Y = as.integer(mod$trptrans),
                 x1 = as.matrix(mod%>%select(travelTimeDrive,travelCostDrive)),
                 x2 = cbind(mod$travelTimeWalk,rep(0,samp_ct)),
                 x3 = cbind(mod$travelTimeBike,rep(0,samp_ct)),
                 x4 = as.matrix(mod%>%select(travelTimeTransit,travelCostTransit)),
                 avail = as.matrix(mod%>%select(av_car,av_walk,av_bike,av_transit)),
                 N_1 = as.integer(max(mod$personid)), # number of households
                 M_1 = 1L,
                 J_1 = as.integer(mod$personid),
                 # Z_1 = cbind(rep(1,samp_ct),mod%>%select(race_2,race_3,race_4,race_5,hhfaminc)),
                 Z_1 = rep(1,samp_ct), 
                 N_2 = as.integer(max(mod$trippurp)), # number of trip purposes
                 M_2 = 1L,
                 J_2 = as.integer(mod$trippurp),
                 prior_only = 0L)

model_fit = model.1$sample(data = data_list,
                           refresh = 100,
                           adapt_delta = 0.90,
                           seed = 76543,
                           iter_warmup  = 2000,
                           iter_sampling =1000,
                           chains = 4,
                           parallel_chains = 4,
                           threads_per_chain = 9,
                           opencl_ids = c(0, 0))

model_fit$save_object(file = "fit_mtc_model-lognorm-v23-2.RDS")

original.R (8.9 KB)


The provided image showcases both the stan posteriors (displayed in the table below the image) and the synthetic data (depicted in the table above the image). It is evident that there is a lack of match between synthetic data and Stan posteriors.

It is important to note that certain variables in my model need to be positive, as indicated in the parameter section. One such variable is b_bc, which represents the cost parameter.

Additionally, I have encountered the following warning multiple times during the warmup stage.

Chain 3 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 3 Exception: gamma_lpdf(OpenCL): Random variable[55, 0] = -1.07909e-05, but it must be not NaN! (in '/tmp/RtmpvxeqsQ/model-945ba1d099263.stan', line 138, column 2 to column 36)
Chain 3 If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
Chain 3 but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Also this warning at the end:

arning: 2 chain(s) finished unexpectedly!
Warning: The returned fit object will only read in results of successful chains. Please use read_cmdstan_csv() to read the results of the failed chains separately.Use the $output(chain_id) method for more output of the failed chains.
Warning: 1999 of 2000 (100.0%) transitions ended with a divergence.
See https://mc-stan.org/misc/warnings for details.

Warning: 1 of 2000 (0.0%) transitions hit the maximum treedepth limit of 10.
See https://mc-stan.org/misc/warnings for details.

Warning: 1 of 2 chains had an E-BFMI less than 0.2.
See https://mc-stan.org/misc/warnings for details.

I would appreciate it if anyone can help me identify the reason behind this discrepancy.

The model above is one I started last summer and we are resurrecting this summer.

We’ve put together a much simplified model and checked results against an external package using maximum simulated likelihood (MSL). The simplified model has the form:
Pr(i) = e^V_i/\sum_j^J e^V_j
where V_i=\alpha_i+-1\times\beta_{cost,i}(\beta_{time,i}TIME+COST) , \alpha_1=0, \alpha_i is an intercept that does not vary across observations, and \beta_{cost,i} and \beta_{time,i} are parameters that vary across individuals (multiple observations per individual, i.e., a panel effect).

We use the below test synthetic data
test_data.csv (159.0 KB)

The dependent variable trptrans takes one of four values (1,2,3, or 4). It is generated from a logit/softmax function as above. Synthetic test parameter values are:

Value
mu_b1tt 3.08
mu_b2tt 2.74
mu_b3tt 3.04
mu_b4tt 2.71
alpha_2 -0.96
alpha_3 -1.09
alpha_4 -1.23
sd1_bc 1.08
sd1_b1tt 0.34
sd1_b2tt 0.36
sd1_b3tt 0.62
sd1_b4tt 0.24

We ran the test data through an external package with validated code to estimate such models by MSL. The test results are

Estimate s.e. t.rat.(0) Rob.s.e. Rob.t.rat.(0)
mu_b1tt 2.9174 0.09275 31.456 0.11102 26.277
mu_b2tt 2.4998 0.09575 26.107 0.11465 21.803
mu_b3tt 2.851 0.0735 38.791 0.08301 34.345
mu_b4tt 2.5247 0.09631 26.214 0.11156 22.632
alpha_2 -1.1343 0.27674 -4.099 0.24999 -4.538
alpha_3 -0.7965 0.16128 -4.939 0.14356 -5.548
alpha_4 -1.1468 0.19619 -5.845 0.17259 -6.644
sd1_bc 1.2139 0.13109 9.26 0.13726 8.844
sd1_b1tt 0.3767 0.03034 12.417 0.03011 12.511
sd1_b2tt 0.2533 0.06582 3.848 0.07723 3.28
sd1_b3tt 0.4136 0.03035 13.625 0.02358 17.536
sd1_b4tt 0.1103 0.04175 2.641 0.02057 5.361

These results appear to confirm the test data is correctly coded, given the starting parameters to the model were all 0 and it was able to roughly recover the correct values.

We updated our Stan code as below. We assume lognormal priors for cost and time as these must be positive (negative when multiplied by -1). SD on priors must be tight given the exp()'s in the lognormal and logit/softmax function.

data {
  int<lower=1> N;  // total number of observations
  int<lower=2> ncat;  // number of categories (1,2,3,4)
  int<lower=2> nvar;  // number of variables for each alternative (cost and time, where cost=0 for categories 2 and 3)
  int<lower=2> ntotvar;  // number of total variables
  array[N] int Y;  // response variable (values of 1,2,3,4)
  // covariate matrix for alt1
  matrix[N,nvar] x1; // matrix[N,nvar] x1;
  // covariate matrix for alt2 
  matrix[N,nvar] x2; // matrix[N,nvar] x2;
  // covariate matrix for alt3 
  matrix[N,nvar] x3; // matrix[N,nvar] x3;
  // covariate matrix for alt4
  matrix[N,nvar] x4; // matrix[N,nvar] x4;
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  array[N] int<lower=1> J_1;  // grouping indicator per observation (1056 unique values)
  int prior_only;  // should the likelihood be ignored?
}
parameters {
  real<lower=0.001> mu_b1tt; // population-level effects
  real<lower=0.001> mu_b2tt; // population-level effects
  real<lower=0.001> mu_b3tt; // population-level effects
  real<lower=0.001> mu_b4tt; // population-level effects
  real b_a2; // population-level effects
  real b_a3; // population-level effects
  real b_a4; // population-level effects
  real<lower=0.001> sd1_bc;  // group-level standard deviations
  real<lower=0.001> sd1_b1tt;  // group-level standard deviations
  real<lower=0.001> sd1_b2tt;  // group-level standard deviations
  real<lower=0.001> sd1_b3tt;  // group-level standard deviations
  real<lower=0.001> sd1_b4tt;  // group-level standard deviations
  vector<offset=0, multiplier = sd1_bc>[N_1] log_b_bc;
  vector<offset=mu_b1tt, multiplier = sd1_b1tt>[N_1] log_b_b1tt;
  vector<offset=mu_b2tt, multiplier = sd1_b2tt>[N_1] log_b_b2tt;
  vector<offset=mu_b3tt, multiplier = sd1_b3tt>[N_1] log_b_b3tt;
  vector<offset=mu_b4tt, multiplier = sd1_b4tt>[N_1] log_b_b4tt;
}
transformed parameters{
	vector<lower=0.001>[N_1] b_bc = exp(log_b_bc);
	vector<lower=0.001>[N_1] b_b1tt = exp(log_b_b1tt);
	vector<lower=0.001>[N_1] b_b2tt = exp(log_b_b2tt);
	vector<lower=0.001>[N_1] b_b3tt = exp(log_b_b3tt);
	vector<lower=0.001>[N_1] b_b4tt = exp(log_b_b4tt);
}

model {
  // priors
  mu_b1tt ~ normal(3,0.25);
  mu_b2tt ~ normal(3,0.25);
  mu_b3tt ~ normal(3,0.25);
  mu_b4tt ~ normal(2,0.25);
  sd1_bc ~ student_t(3,0,1);
  sd1_b1tt ~ student_t(3,0,1);
  sd1_b2tt ~ student_t(3,0,1);
  sd1_b3tt ~ student_t(3,0,1);
  sd1_b4tt ~ student_t(3,0,1);
  b_a2 ~ normal(-0.9, 1);
  b_a3 ~ normal(-1.1, 1);
  b_a4 ~ normal(-1, 1);
  log_b_bc ~ normal(0, sd1_bc);
  log_b_b1tt ~ normal(mu_b1tt, sd1_b1tt);
  log_b_b2tt ~ normal(mu_b2tt, sd1_b2tt);
  log_b_b3tt ~ normal(mu_b3tt, sd1_b3tt);
  log_b_b4tt ~ normal(mu_b4tt, sd1_b4tt);
  
  real ptarget = 0;
  // Define matrices/vector for x, alpha, beta
  matrix[ncat,nvar] x;
  vector[ncat] alpha;
  matrix[nvar,ncat] beta;

  row_vector[ncat] a = [0, b_a2, b_a3, b_a4];
	  
  // initialize linear predictor term
  // Terms are btc, b1tt, b2tt, b3tt, b4tt

  array[N] row_vector[ntotvar] b = rep_array(rep_row_vector(0,ntotvar), N);

  for (n in 1:N) {
	// add to linear predictor term
	// Terms are btc (0), b1tt, b2tt, b3tt, b4tt
	b[n] += [b_bc[J_1[n]], b_b1tt[J_1[n]], b_b2tt[J_1[n]], b_b3tt[J_1[n]], b_b4tt[J_1[n]]];
	// Each x and beta is a matrix with dimensions alts x variables
	// Our x will be the time/cost coming in as inputs
	x = to_matrix([x1[n],x2[n],x3[n],x4[n]]);

	// Our betas will be the hierarchical slope parameters (2 x 4)
	beta = -1*[b[n,1] * b[n,2:5], [b[n,1],0,0,b[n,1]]];
	// Our alphas will be the hierarchical intercept parameters. ln(avail[n,i]] gives an availability measure
	alpha = [a[1], a[2], a[3], a[4]]';
	ptarget += categorical_logit_glm_lupmf(Y[n] | x, alpha, beta);
  }
}

The above model can be run with the following R script.

database = read.csv("test_data.csv",header=TRUE)

# change personid to sequential values
database = database[order(database$personid),]

database$personid = cumsum(c(0, diff(database$personid)) != 0) + 1

model.1 = cmdstan_model("simple_model.stan", stanc_options = list("O1"))

samp_ct = nrow(database)

data_list1 = list(N = samp_ct,
                 ncat = 4L,
                 nvar = 2L,
                 ntotvar = 5L,
                 Y = as.integer(database$trptrans),
                 x1 = as.matrix(database%>%select(travelTimeDrive,travelCostDrive)),
                 x2 = cbind(database$travelTimeWalk,rep(0,samp_ct)),
                 x3 = cbind(database$travelTimeBike,rep(0,samp_ct)),
                 x4 = as.matrix(database%>%select(travelTimeTransit,travelCostTransit)),
                 N_1 = as.integer(max(database$personid)), # number of households
                 J_1 = as.integer(database$personid),
                 prior_only = 0L)

model_fit1 = model.1$sample(data = data_list1,
                           refresh = 100,
                           adapt_delta = 0.90,
                           seed = 1232,
                           iter_warmup  = 1000,
                           iter_sampling =1000,
                           chains = 2,
                           parallel_chains = 2)

print(model_fit1$summary(), n=16)

This Stan model does not replicate our test data. We also tried running optimization() to see if it would reproduce what we were getting with MSL, but it gives a similar result (but with all sd dropping to our lower value of 0.001). Stan results are as below. Parameters are roughly in the right range but it’s sensitive (sd are too large given the sensitive of the exp()). We looked at the resulting dependent value - i.e., probability(category=i).

variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
mu_b1tt 3.01 3.01 0.261 0.253 2.59 3.44 1 414 356
mu_b2tt 2.98 2.98 0.249 0.241 2.58 3.39 1 266 254
mu_b3tt 3 2.99 0.242 0.229 2.63 3.43 1 436 506
mu_b4tt 1.99 1.98 0.255 0.275 1.59 2.41 1.01 376 422
b_a2 -0.832 -0.846 0.944 0.925 -2.44 0.683 1 374 412
b_a3 -1.03 -1.05 1 1.03 -2.64 0.529 1.01 384 457
b_a4 -0.903 -0.89 1.03 1.08 -2.6 0.711 1 409 649
sd1_bc 0.813 0.707 0.585 0.648 0.0626 1.93 1 162 207
sd1_b1tt 0.875 0.722 0.693 0.666 0.0636 2.32 1.01 341 252
sd1_b2tt 0.897 0.698 0.741 0.668 0.0617 2.46 1 191 113
sd1_b3tt 0.919 0.731 0.723 0.65 0.0638 2.37 1.01 146 193
sd1_b4tt 0.814 0.667 0.627 0.628 0.0577 2.08 1.01 164 215

We ran our model in the same software that did the MSL but this time for hierarchical Bayes (using the standard Metropolis-Gibbs sampler developed by Train in the early 2000s). We specify lognormal priors for time/cost and normal priors for our intercepts/alphas. The results look ok in this implementation, too. It seems to be a Stan coding rather than synthetic data coding error.

The other package won’t run our final model because we have a few features it can’t handle (e.g., multiple hierarchical structures). This model runs in about 10 minutes.

Results from our Bayes run in our validation package.

Model name                                  : MMNL_wtp_space_inter_HB
Model description                           : Mixed logit model on synthetic data using HB
Model run at                                : 2023-06-13 14:03:02.913994
Estimation method                           : Hierarchical Bayes
Number of individuals                       : 1056
Number of rows in database                  : 2000
Number of modelled outcomes                 : 2000
Number of cores used                        :  1 
Estimation carried out using RSGHB
Burn-in iterations                          : 10000
Post burn-in iterations                     : 50000
Classical model fit statistics were calculated at parameter values obtained using averaging
  across the post burn-in iterations.
LL(start)                                   : -1389.27
LL at equal shares, LL(0)                   : -2772.59
LL at observed shares, LL(C)                : -2173.76
LL(final)                                   : -1133.35
Rho-squared vs equal shares                  :  0.5912 
Adj.Rho-squared vs equal shares              :  0.5833 
Rho-squared vs observed shares               :  0.4786 
Adj.Rho-squared vs observed shares           :  0.4685 
AIC                                         :  2310.69 
BIC                                         :  2433.91 
Equiv. estimated parameters                 :  22
 (non-random parameters                     :  3)
 (means of random parameters                :  4)
 (covariance matrix terms                   :  15)
Time taken (hh:mm:ss)                       :  00:12:43.28 
     pre-estimation                         :  00:00:5.39 
     estimation                             :  00:12:35.85 
     post-estimation                        :  00:00:2.04 

Summary of parameter chains
Non-random coefficients 
         Mean     SD
asc_1  0.0000     NA
asc_2 -0.9485 0.4641
asc_3 -1.3236 0.2932
asc_4 -1.0855 0.2976
Results for posterior means for random coefficients 
         Mean     SD
b_tc  -1.6011 0.4481
v_tt1 18.5098 2.6511
v_tt2 13.5224 0.9949
v_tt3 16.0203 3.0750
v_tt4 12.8103 1.0895
Summary of distributions of random coeffients (after distributional transforms) 
         Mean     SD
b_tc  -1.5868 1.8517
v_tt1 18.3298 8.4095
v_tt2 13.4289 5.5528
v_tt3 15.8304 9.0999
v_tt4 12.7675 5.3353
Upper level model results for mean parameters for underlying Normals 
        Mean     SD
b_tc  0.0000 0.0000
v_tt1 2.8187 0.0950
v_tt2 2.5153 0.1418
v_tt3 2.6236 0.1115
v_tt4 2.4657 0.1046