How to use prior_aux autoscale=FALSE?

liztighe · March 21, 2019, 6:12pm

OS: Ubuntu 18.0.4.2
rstanarm: 2.18.2

I’m trying to shift from rstan to setting up models using rstanarm, multilevel logistic regressions, observations nested within counties. We typically use the cauchy distribution for variance parameters. I’ve tried using prior_aux=cauchy(0, 2.5, autoscale=FALSE) and it does not appear to work … checking priors with prior_summary displays only the intercept, coefficients and covariance. checking the model code that is generated, it appears the model uses the gamma distribution for variance parameters?

Any tips/suggestions would be much appreciated!

prior_test <- stan_glmer(y ~ femf + (1|agef) + (1|eduf) +
                          (1|racf) + (1|cntyf),
                          data=nysubdat, family=binomial("logit"),
                          chains=1, iter=500)


prior_summary(prior_test)

Priors for model 'prior_test'
------
Intercept (after predictors centered)
 ~ normal(location = 0, scale = 10)

Coefficients
 ~ normal(location = 0, scale = 2.5)

Covariance
 ~ decov(reg. = 1, conc. = 1, shape = 1, scale = 1)

test_no_autoscale <- update(prior_test,
                          prior = normal(0, 5, autoscale=FALSE),
                          prior_intercept=normal(0, 10, autoscale=FALSE),
                          prior_aux=cauchy(0, 2.5, autoscale=FALSE))

prior_summary(test_no_autoscale)

Priors for model 'test_no_autoscale'
------
Intercept (after predictors centered)
 ~ normal(location = 0, scale = 5)

Coefficients
 ~ normal(location = 0, scale = 5)

Covariance
 ~ decov(reg. = 1, conc. = 1, shape = 1, scale = 1)
 
test_no_autoscale$stanfit@model_pars
 [1] "gamma"           "z_beta"          "z_beta_smooth"   "smooth_sd_raw"
 [5] "global"          "local"           "caux"            "mix"
 [9] "one_over_lambda" "z_b"             "z_T"             "rho"
[13] "zeta"            "tau"             "beta"            "beta_smooth"
[17] "smooth_sd"       "b"               "theta_L"         "mean_PPD"
[21] "alpha"           "lp__"

bgoodri · March 21, 2019, 6:51pm

The prior on the measurement error in the observed outcomes should be half-Cauchy although it does not seem to be reflected in the prior_summary. The variance in the group-level parameters across groups is not half-Cauchy and actually its family cannot be changed, but there are arguments to decov to change the hyperparameters. That prior is a scaled simplex on the variances (not the standard deviations).

liztighe · March 21, 2019, 7:26pm

thanks! I thought I’d save some space by just listing the parameters … Here’s the model code that is generated … Am not really understanding from the code where the variance components are assigned half-Cauchy priors … is it done in some of the #includes?

#include /pre/Columbia_copyright.stan
#include /pre/license.stan

// GLM for a Bernoulli outcome
functions {
#include /functions/common_functions.stan
#include /functions/bernoulli_likelihoods.stan
}
data {
// dimensions
int<lower=0> K; // number of predictors
int<lower=1> N[2]; // number of observations where y = 0 and y = 1 respectively
vector[K] xbar; // vector of column-means of rbind(X0, X1)
int<lower=0,upper=1> dense_X; // flag for dense vs. sparse
matrix[N[1],K] X0[dense_X]; // centered (by xbar) predictor matrix | y = 0
matrix[N[2],K] X1[dense_X]; // centered (by xbar) predictor matrix | y = 1

int<lower=0, upper=1> clogit; // 1 iff the number of successes is fixed in each stratum
int<lower=0> J; // number of strata (possibly zero)
int<lower=1,upper=J> strata[clogit == 1 ? N[1] + N[2] : 0];

// stuff for the sparse case
int<lower=0> nnz_X0; // number of non-zero elements in the implicit X0 matrix
vector[nnz_X0] w_X0; // non-zero elements in the implicit X0 matrix
int<lower=0, upper = K - 1> v_X0[nnz_X0]; // column indices for w_X0
// where the non-zeros start in each row of X0
int<lower=0, upper = rows(w_X0) + 1> u_X0[dense_X ? 0 : N[1] + 1];
int<lower=0> nnz_X1; // number of non-zero elements in the implicit X1 matrix
vector[nnz_X1] w_X1; // non-zero elements in the implicit X1 matrix
int<lower=0, upper = K - 1> v_X1[nnz_X1]; // column indices for w_X1
// where the non-zeros start in each row of X1
int<lower=0, upper = rows(w_X1) + 1> u_X1[dense_X ? 0 : N[2] + 1];
// declares prior_PD, has_intercept, link, prior_dist, prior_dist_for_intercept
#include /data/data_glm.stan

int<lower=0> K_smooth;
matrix[N[1], K_smooth] S0;
matrix[N[2], K_smooth] S1;
int<lower=1> smooth_map[K_smooth];

int<lower=5,upper=5> family;

// weights
int<lower=0,upper=1> has_weights; // 0 = No, 1 = Yes
vector[has_weights ? N[1] : 0] weights0;
vector[has_weights ? N[2] : 0] weights1;

// offset
int<lower=0,upper=1> has_offset; // 0 = No, 1 = Yes
vector[has_offset ? N[1] : 0] offset0;
vector[has_offset ? N[2] : 0] offset1;

// declares prior_{mean, scale, df}, prior_{mean, scale, df}for_intercept, prior{mean, scale, df}for_aux
#include /data/hyperparameters.stan
// declares t, p[t], l[t], q, len_theta_L, shape, scale, {len}concentration, {len_}regularization
#include /data/glmer_stuff.stan

// more glmer stuff
int<lower=0> num_non_zero[2]; // number of non-zero elements in the Z matrices
vector[num_non_zero[1]] w0; // non-zero elements in the implicit Z0 matrix
vector[num_non_zero[2]] w1; // non-zero elements in the implicit Z1 matrix
int<lower=0, upper = q - 1> v0[num_non_zero[1]]; // column indices for w0
int<lower=0, upper = q - 1> v1[num_non_zero[2]]; // column indices for w1
// where the non-zeros start in each row of Z0
int<lower=0, upper = rows(w0) + 1> u0[t > 0 ? N[1] + 1 : 0];
// where the non-zeros start in each row of Z1
int<lower=0, upper = rows(w1) + 1> u1[t > 0 ? N[2] + 1 : 0];
int<lower=0, upper=1> special_case; // whether we only have to deal with (1|group)
}
transformed data {
int NN = N[1] + N[2];
real aux = not_a_number();
int<lower=1> V0[special_case ? t : 0,N[1]] = make_V(N[1], special_case ? t : 0, v0);
int<lower=1> V1[special_case ? t : 0,N[2]] = make_V(N[2], special_case ? t : 0, v1);
int<lower=0> successes[clogit ? J : 0];
int<lower=0> failures[clogit ? J : 0];
int<lower=0> observations[clogit ? J : 0];
// defines hs, len_z_T, len_var_group, delta, pos
#include /tdata/tdata_glm.stan
for (j in 1:J) {
successes[j] = 0;
failures[j] = 0;
}
if (J > 0) for (i in 1:N[2]) successes[strata[i]] += 1;
if (J > 0) for (i in (N[2] + 1):NN) failures[strata[i]] += 1;
for (j in 1:J) observations[j] = failures[j] + successes[j];
}
parameters {
real<upper=(link == 4 ? 0.0 : positive_infinity())> gamma[has_intercept];
// declares z_beta, global, local, z_b, z_T, rho, zeta, tau
#include /parameters/parameters_glm.stan
}
transformed parameters {
// defines beta, b, theta_L
#include /tparameters/tparameters_glm.stan
if (t > 0) {
if (special_case) {
int start = 1;
theta_L = scale .* tau;
if (t == 1) b = theta_L[1] * z_b;
else for (i in 1:t) {
int end = start + l[i] - 1;
b[start:end] = theta_L[i] * z_b[start:end];
start = end + 1;
}
}
else {
theta_L = make_theta_L(len_theta_L, p,
1.0, tau, scale, zeta, rho, z_T);
b = make_b(z_b, theta_L, p, l);
}
}
}
model {
// defines eta0, eta1
#include /model/make_eta_bern.stan
if (has_intercept == 1) {
if (link != 4) {
eta0 += gamma[1];
eta1 += gamma[1];
}
else {
real shift = fmax(max(eta0), max(eta1));
eta0 += gamma[1] - shift;
eta1 += gamma[1] - shift;
}
}
// Log-likelihood
if (clogit && prior_PD == 0) {
real dummy = ll_clogit_lp(eta0, eta1, successes, failures, observations);
}
else if (has_weights == 0 && prior_PD == 0) {
real dummy = ll_bern_lp(eta0, eta1, link, N);
}
else if (prior_PD == 0) { // weighted log-likelihoods
target += dot_product(weights0, pw_bern(0, eta0, link));
target += dot_product(weights1, pw_bern(1, eta1, link));
}

#include /model/priors_glm.stan
if (t > 0) {
real dummy = decov_lp(z_b, z_T, rho, zeta, tau,
regularization, delta, shape, t, p);
}
}
generated quantities {
real mean_PPD = compute_mean_PPD ? 0 : negative_infinity();
real alpha[has_intercept];

if (has_intercept == 1) {
if (dense_X) alpha[1] = gamma[1] - dot_product(xbar, beta);
else alpha[1] = gamma[1];
}

if (compute_mean_PPD) {
vector[N[1]] pi0;
vector[N[2]] pi1;
// defines eta0, eta1
#include /model/make_eta_bern.stan
if (has_intercept == 1) {
if (link != 4) {
eta0 += gamma[1];
eta1 += gamma[1];
}
else {
real shift;
shift = fmax(max(eta0), max(eta1));
eta0 += gamma[1] - shift;
eta1 += gamma[1] - shift;
alpha[1] -= shift;
}
}
if (clogit) for (j in 1:J) mean_PPD += successes[j]; // fixed by design
else {
pi0 = linkinv_bern(eta0, link);
pi1 = linkinv_bern(eta1, link);
for (n in 1:N[1]) mean_PPD += bernoulli_rng(pi0[n]);
for (n in 1:N[2]) mean_PPD += bernoulli_rng(pi1[n]);
}
mean_PPD /= NN;
}
}

bgoodri · March 21, 2019, 7:56pm

It is easier to post links to GitHub, although the Stan programs in rstanarm are not really intended to be read by humans anyway. For the case of a continuous outcome, the prior on the error standard deviation can be half-Student t — which includes the half-Cauchy as a special case where the degrees of freedom are 1 — but the prior is actually applied to an unscaled but positive parameter and then scaled up by the prior_scale_aux when evaluating the log-likelihood.

In the case of a Bernoulli / binomial outcome that you pasted, there is no auxiliary parameter. But in neither case do the standard deviations or variances in the group-level parameters have half-Cauchy priors in rstanarm.

github.com

stan-dev/rstanarm/blob/master/src/stan_files/continuous.stan#L210


    else if (family == 4 && link_phi == 0) summands = pw_beta(y, eta, aux, link);
    else if (family == 4 && link_phi > 0) summands = pw_beta_z(y, eta, eta_z, link, link_phi);
    target += dot_product(weights, summands);
  }


  // Log-priors
  if (prior_dist_for_aux > 0 && prior_scale_for_aux > 0) {
    real log_half = -0.693147180559945286;
    if (prior_dist_for_aux == 1)
      target += normal_lpdf(aux_unscaled | 0, 1) - log_half;
    else if (prior_dist_for_aux == 2)
      target += student_t_lpdf(aux_unscaled | prior_df_for_aux, 0, 1) - log_half;
    else 
     target += exponential_lpdf(aux_unscaled | 1);
  }
    
#include /model/priors_glm.stan
#include /model/priors_betareg.stan
  if (t > 0) {
    real dummy = decov_lp(z_b, z_T, rho, zeta, tau, 
                          regularization, delta, shape, t, p);

Topic		Replies	Views
Setting prior on Sigma for stan_glmer rstanarm	3	831	September 13, 2019
How to set autoscaling = FALSE for one parameter, while autoscaling = TRUE for all other parameters rstanarm specification	7	684	February 1, 2024
Help implementing a Default Bayes Prior in stan_lmer() rstanarm priors	6	704	December 9, 2020
Prior statement in stan Algorithms	1	348	September 14, 2021
Multilevel logistic regression: specifying prior for group-level standard deviation rstanarm techniques	1	1139	May 6, 2017

How to use prior_aux autoscale=FALSE?

Related topics