How to use prior_aux autoscale=FALSE?

OS: Ubuntu 18.0.4.2
rstanarm: 2.18.2

I’m trying to shift from rstan to setting up models using rstanarm, multilevel logistic regressions, observations nested within counties. We typically use the cauchy distribution for variance parameters. I’ve tried using prior_aux=cauchy(0, 2.5, autoscale=FALSE) and it does not appear to work … checking priors with prior_summary displays only the intercept, coefficients and covariance. checking the model code that is generated, it appears the model uses the gamma distribution for variance parameters?

Any tips/suggestions would be much appreciated!

prior_test <- stan_glmer(y ~ femf + (1|agef) + (1|eduf) +
                          (1|racf) + (1|cntyf),
                          data=nysubdat, family=binomial("logit"),
                          chains=1, iter=500)


prior_summary(prior_test)

Priors for model 'prior_test'
------
Intercept (after predictors centered)
 ~ normal(location = 0, scale = 10)

Coefficients
 ~ normal(location = 0, scale = 2.5)

Covariance
 ~ decov(reg. = 1, conc. = 1, shape = 1, scale = 1)

test_no_autoscale <- update(prior_test,
                          prior = normal(0, 5, autoscale=FALSE),
                          prior_intercept=normal(0, 10, autoscale=FALSE),
                          prior_aux=cauchy(0, 2.5, autoscale=FALSE))

prior_summary(test_no_autoscale)

Priors for model 'test_no_autoscale'
------
Intercept (after predictors centered)
 ~ normal(location = 0, scale = 5)

Coefficients
 ~ normal(location = 0, scale = 5)

Covariance
 ~ decov(reg. = 1, conc. = 1, shape = 1, scale = 1)
 
test_no_autoscale$stanfit@model_pars
 [1] "gamma"           "z_beta"          "z_beta_smooth"   "smooth_sd_raw"
 [5] "global"          "local"           "caux"            "mix"
 [9] "one_over_lambda" "z_b"             "z_T"             "rho"
[13] "zeta"            "tau"             "beta"            "beta_smooth"
[17] "smooth_sd"       "b"               "theta_L"         "mean_PPD"
[21] "alpha"           "lp__"

The prior on the measurement error in the observed outcomes should be half-Cauchy although it does not seem to be reflected in the prior_summary. The variance in the group-level parameters across groups is not half-Cauchy and actually its family cannot be changed, but there are arguments to decov to change the hyperparameters. That prior is a scaled simplex on the variances (not the standard deviations).

thanks! I thought I’d save some space by just listing the parameters … Here’s the model code that is generated … Am not really understanding from the code where the variance components are assigned half-Cauchy priors … is it done in some of the #includes?

#include /pre/Columbia_copyright.stan
#include /pre/license.stan

// GLM for a Bernoulli outcome
functions {
#include /functions/common_functions.stan
#include /functions/bernoulli_likelihoods.stan
}
data {
// dimensions
int<lower=0> K; // number of predictors
int<lower=1> N[2]; // number of observations where y = 0 and y = 1 respectively
vector[K] xbar; // vector of column-means of rbind(X0, X1)
int<lower=0,upper=1> dense_X; // flag for dense vs. sparse
matrix[N[1],K] X0[dense_X]; // centered (by xbar) predictor matrix | y = 0
matrix[N[2],K] X1[dense_X]; // centered (by xbar) predictor matrix | y = 1

int<lower=0, upper=1> clogit; // 1 iff the number of successes is fixed in each stratum
int<lower=0> J; // number of strata (possibly zero)
int<lower=1,upper=J> strata[clogit == 1 ? N[1] + N[2] : 0];

// stuff for the sparse case
int<lower=0> nnz_X0; // number of non-zero elements in the implicit X0 matrix
vector[nnz_X0] w_X0; // non-zero elements in the implicit X0 matrix
int<lower=0, upper = K - 1> v_X0[nnz_X0]; // column indices for w_X0
// where the non-zeros start in each row of X0
int<lower=0, upper = rows(w_X0) + 1> u_X0[dense_X ? 0 : N[1] + 1];
int<lower=0> nnz_X1; // number of non-zero elements in the implicit X1 matrix
vector[nnz_X1] w_X1; // non-zero elements in the implicit X1 matrix
int<lower=0, upper = K - 1> v_X1[nnz_X1]; // column indices for w_X1
// where the non-zeros start in each row of X1
int<lower=0, upper = rows(w_X1) + 1> u_X1[dense_X ? 0 : N[2] + 1];
// declares prior_PD, has_intercept, link, prior_dist, prior_dist_for_intercept
#include /data/data_glm.stan

int<lower=0> K_smooth;
matrix[N[1], K_smooth] S0;
matrix[N[2], K_smooth] S1;
int<lower=1> smooth_map[K_smooth];

int<lower=5,upper=5> family;

// weights
int<lower=0,upper=1> has_weights; // 0 = No, 1 = Yes
vector[has_weights ? N[1] : 0] weights0;
vector[has_weights ? N[2] : 0] weights1;

// offset
int<lower=0,upper=1> has_offset; // 0 = No, 1 = Yes
vector[has_offset ? N[1] : 0] offset0;
vector[has_offset ? N[2] : 0] offset1;

// declares prior_{mean, scale, df}, prior_{mean, scale, df}for_intercept, prior{mean, scale, df}for_aux
#include /data/hyperparameters.stan
// declares t, p[t], l[t], q, len_theta_L, shape, scale, {len
}concentration, {len_}regularization
#include /data/glmer_stuff.stan

// more glmer stuff
int<lower=0> num_non_zero[2]; // number of non-zero elements in the Z matrices
vector[num_non_zero[1]] w0; // non-zero elements in the implicit Z0 matrix
vector[num_non_zero[2]] w1; // non-zero elements in the implicit Z1 matrix
int<lower=0, upper = q - 1> v0[num_non_zero[1]]; // column indices for w0
int<lower=0, upper = q - 1> v1[num_non_zero[2]]; // column indices for w1
// where the non-zeros start in each row of Z0
int<lower=0, upper = rows(w0) + 1> u0[t > 0 ? N[1] + 1 : 0];
// where the non-zeros start in each row of Z1
int<lower=0, upper = rows(w1) + 1> u1[t > 0 ? N[2] + 1 : 0];
int<lower=0, upper=1> special_case; // whether we only have to deal with (1|group)
}
transformed data {
int NN = N[1] + N[2];
real aux = not_a_number();
int<lower=1> V0[special_case ? t : 0,N[1]] = make_V(N[1], special_case ? t : 0, v0);
int<lower=1> V1[special_case ? t : 0,N[2]] = make_V(N[2], special_case ? t : 0, v1);
int<lower=0> successes[clogit ? J : 0];
int<lower=0> failures[clogit ? J : 0];
int<lower=0> observations[clogit ? J : 0];
// defines hs, len_z_T, len_var_group, delta, pos
#include /tdata/tdata_glm.stan
for (j in 1:J) {
successes[j] = 0;
failures[j] = 0;
}
if (J > 0) for (i in 1:N[2]) successes[strata[i]] += 1;
if (J > 0) for (i in (N[2] + 1):NN) failures[strata[i]] += 1;
for (j in 1:J) observations[j] = failures[j] + successes[j];
}
parameters {
real<upper=(link == 4 ? 0.0 : positive_infinity())> gamma[has_intercept];
// declares z_beta, global, local, z_b, z_T, rho, zeta, tau
#include /parameters/parameters_glm.stan
}
transformed parameters {
// defines beta, b, theta_L
#include /tparameters/tparameters_glm.stan
if (t > 0) {
if (special_case) {
int start = 1;
theta_L = scale .* tau;
if (t == 1) b = theta_L[1] * z_b;
else for (i in 1:t) {
int end = start + l[i] - 1;
b[start:end] = theta_L[i] * z_b[start:end];
start = end + 1;
}
}
else {
theta_L = make_theta_L(len_theta_L, p,
1.0, tau, scale, zeta, rho, z_T);
b = make_b(z_b, theta_L, p, l);
}
}
}
model {
// defines eta0, eta1
#include /model/make_eta_bern.stan
if (has_intercept == 1) {
if (link != 4) {
eta0 += gamma[1];
eta1 += gamma[1];
}
else {
real shift = fmax(max(eta0), max(eta1));
eta0 += gamma[1] - shift;
eta1 += gamma[1] - shift;
}
}
// Log-likelihood
if (clogit && prior_PD == 0) {
real dummy = ll_clogit_lp(eta0, eta1, successes, failures, observations);
}
else if (has_weights == 0 && prior_PD == 0) {
real dummy = ll_bern_lp(eta0, eta1, link, N);
}
else if (prior_PD == 0) { // weighted log-likelihoods
target += dot_product(weights0, pw_bern(0, eta0, link));
target += dot_product(weights1, pw_bern(1, eta1, link));
}

#include /model/priors_glm.stan
if (t > 0) {
real dummy = decov_lp(z_b, z_T, rho, zeta, tau,
regularization, delta, shape, t, p);
}
}
generated quantities {
real mean_PPD = compute_mean_PPD ? 0 : negative_infinity();
real alpha[has_intercept];

if (has_intercept == 1) {
if (dense_X) alpha[1] = gamma[1] - dot_product(xbar, beta);
else alpha[1] = gamma[1];
}

if (compute_mean_PPD) {
vector[N[1]] pi0;
vector[N[2]] pi1;
// defines eta0, eta1
#include /model/make_eta_bern.stan
if (has_intercept == 1) {
if (link != 4) {
eta0 += gamma[1];
eta1 += gamma[1];
}
else {
real shift;
shift = fmax(max(eta0), max(eta1));
eta0 += gamma[1] - shift;
eta1 += gamma[1] - shift;
alpha[1] -= shift;
}
}
if (clogit) for (j in 1:J) mean_PPD += successes[j]; // fixed by design
else {
pi0 = linkinv_bern(eta0, link);
pi1 = linkinv_bern(eta1, link);
for (n in 1:N[1]) mean_PPD += bernoulli_rng(pi0[n]);
for (n in 1:N[2]) mean_PPD += bernoulli_rng(pi1[n]);
}
mean_PPD /= NN;
}
}

It is easier to post links to GitHub, although the Stan programs in rstanarm are not really intended to be read by humans anyway. For the case of a continuous outcome, the prior on the error standard deviation can be half-Student t — which includes the half-Cauchy as a special case where the degrees of freedom are 1 — but the prior is actually applied to an unscaled but positive parameter and then scaled up by the prior_scale_aux when evaluating the log-likelihood.

In the case of a Bernoulli / binomial outcome that you pasted, there is no auxiliary parameter. But in neither case do the standard deviations or variances in the group-level parameters have half-Cauchy priors in rstanarm.