Unequal variances in groups

Hi Everyone,

I have some toy data below that simulates having 2 groups with heteroskedasticity. The variance of Y increases as the independent variable X increases:

Untitled1

I was looking to model the variances as mentioned here:
https://jrnold.github.io/bayesian_notes/heteroskedasticity.html

"It is common to model the log-transformation of the scale or variance to transform it to R, log \sigma_i = Normal(Z_i\gamma,\omega) where Z_i are covariates used to the model the variance, which may or may not be the same as X_i"

My attempt at modeling the variances is in the transformed parameters block.

  • Can you advise if the modeling of the variance in the transformed parameters block is correct?

  • When I look at the ppc, the median for group A is not near the observed. Any thoughts on iimproving the model to better capture group A’s median?

Untitled2

  • Parameter plots are below and there are high correlations between the group intercepts and the slopes. Is this something to be concerned about?

Thank you in advance!

Untitled3
r code:

library(dplyr)
library(ggplot2)
library(rstan)
library(bayesplot)
library(gridExtra)
set.seed(45)
n = 2000
group= sample(c("A","B"),n,replace= TRUE)
x= sample(seq(1,30,.25),n,replace = TRUE)
y = rep(0,n)
table(group)
for(i in 1:n){
  #i=1
  if(x[i]<10){
    a = 3
    b=8
  }else if(x[i]>10 & x[i]<20){
    a=7
    b=13
  }else{
    a=12
    b=19
  }
  #a=5
  #b=5
  y[i] = ifelse(group[i]=="A",.8,.7)*x[i]+ 
    ifelse(group[i]=="A",8,-2)+
    rnorm(1,0,ifelse(group[i]=="A",a,b))
}
d = data.frame(x=x, y=y,group=group)
#plot
ggplot(d, aes(x=x, y = y,color = group))+geom_point()+geom_smooth(method ="lm")+facet_wrap(~group)
data = data.frame(x =x , y = y, group = group)
#stan input
input = list(X =data$x , Y= data$y,N= length(data$x), N_group=2, group = as.numeric(data$group))
fit = stan(file = "mystanfile.stan",data=input,seed=233,chains = 2)
fit
params = rstan::extract(fit)
y = data$y
yrep = params$y_ppc
grid.arrange(
  ppc_stat_grouped(y, yrep, group = data$group, stat = "median"),
  ppc_stat_grouped(y, yrep, group = data$group, stat = "mean")
)
post = as.array(fit)
mcmc_pairs( post, np =nuts_params(fit) ,  
            pars = c(  #"mu_alpha[1]","mu_alpha[2]",
              #"mu_beta[1]","mu_beta[2]",
                    # "sig_alpha[1]","sig_alpha[2]", "sig_beta[1]","sig_beta[2]","sig[1]","sig[2]"
   "sig_beta","sig_group[1]","sig_group[2]"
                     ), 
           off_diag_args = list(size = 0.7))

stan code:


data{
  int<lower=1> N;
  vector[N] X;
  int N_group;
  int<lower=1, upper =N_group> group[N];
  real Y[N]; 
}
parameters{
  vector[N_group] mu_alpha; 
  vector[N_group] mu_beta;
  vector[N_group] sig_group;
  vector[N_group] sig_beta;
}
transformed parameters{
  vector[N] log_sig;
  vector<lower=0>[N] sig;
  for(n in 1:N){
    log_sig[n] = sig_beta[group[n]]*X[n]+sig_group[group[n]];
    sig[n] = exp(log_sig[n]);
  }
}
model{
  mu_alpha~normal(0,5);
  mu_beta~normal(0,5);
  
  sig~normal(0,20);
  log_sig~normal(0,20);
  //sig_group ~normal(0,20);
  sig_beta ~normal(0,20);
  sig_group ~normal(0,20);
  
  for(n in 1:N){
    Y[n] ~ normal(mu_alpha[group[n]]+mu_beta[group[n]]*X[n], sig[n]  );
  }
}
generated quantities{
  vector[N] y_ppc;
  for(n in 1:N){
    y_ppc[n] = normal_rng( mu_alpha[group[n]]+mu_beta[group[n]]*X[n],sig[n] ) ;
  }
}

2 Likes

Hi! First of all: very nice post! :)

I rewrote your model slightly (a bit more concise and efficient) and found that you have put a prior on log_sig and sig. You don’t have to do that! Generally, you only want to put priors on the stuff that’s in the parameters block. The transformed parameters are just that: transformed parameters, so the priors you put on your (“proper”) parameters imply the priors for the transformed parameters. So here’s the model:

data{
  int<lower=1> N;
  vector[N] X;
  int N_group;
  int<lower=1, upper=N_group> group[N];
  vector[N] Y; 
}
parameters{
  vector[N_group] mu_alpha; 
  vector[N_group] mu_beta;
  vector[N_group] sig_group;
  vector[N_group] sig_beta;
}
transformed parameters{
  vector<lower=0>[N] sig = exp(sig_beta[group] .* X + sig_group[group]);
}
model{
  mu_alpha ~ normal(0,5);
  mu_beta ~ normal(0,5);
  
  sig_beta ~ normal(0,20); // these should probably be tighter
  sig_group ~ normal(0,20); // these should probably be tighter
  
  Y ~ normal(mu_alpha[group] + mu_beta[group] .* X, sig);
  
}
generated quantities{
  vector[N] y_ppc;
  for(n in 1:N){
    y_ppc[n] = normal_rng( mu_alpha[group[n]]+mu_beta[group[n]]*X[n],sig[n] ) ;
  }
}

And when I looked at the plots I found:


…so I guess this is two birds with one stone! :)

Parameter plots are below and there are high correlations between the group intercepts and the slopes. Is this something to be concerned about?

This is very common in multilevel modeling! Generally, slopes and intercepts are correlated. To get a feeling for this, draw a scatterplot and a regression line through it. Now, changing the slope will automatically change the intercept (unless the data are centered around zero).

And usually this is taken into account by applying a multivariate normal prior on the slope and intercept parameter. I found Chapter 13 of the Gelman/Hill textbook to be a nice resource for learning about this, but this should also be covered in other textbooks (I think the cover of Statistical Rethinking is a plot of exactly this phenomenon).

Cheers! :)

8 Likes

Thanks for the response. I am looking to run a prior predicitve check for the model above.

code to run ppc:

input = list(X =data$x , N= length(data$x), N_group=2, group = as.numeric(data$group))
fit ← stan(file=‘prior.stan’, data=input, iter=1000, warmup=0, chains=1, seed=4838282, algorithm=“Fixed_param”)

Running prior.stan returns parametes with NAs:
see: extract(fit)$mu_alpha[,1]

and while running there are messages: Chain 1: Exception: normal_rng: Scale parameter is inf, but must be finite!

The code for prior.stan is below. There are 4 parameters and 2 groups so the “for(i in 1:N_group)” loop creates 8 random parameters but extract(fit)$mu_alpha[,1] shows NAs.

Any idea how to set up the ppc ? Is vector<lower=0> sig[N]; necessary in generated quantities? Thank you in advance.

Here is prior.stan:

data{
int<lower=1> N;
vector[N] X;
int N_group;
int<lower=1, upper =N_group> group[N];
}
generated quantities{
real y[N];
vector[N_group] mu_alpha;
vector[N_group] mu_beta;
vector[N_group] sig_beta;
vector[N_group] sig_group;
vector<lower=0>[N] sig;
for(i in 1:N_group){
mu_alpha[i] = normal_rng(0,5);
mu_beta[i] = normal_rng(0,5);
sig_beta[i] = normal_rng(0,20);
sig_group[i] = normal_rng(0,20);
}
for(n in 1:N){
sig[n] = exp( sig_group[group[n]] + sig_beta[group[n]]*X[n] );
y[n] = normal_rng( mu_alpha[group[n]]+mu_beta[group[n]]*X[n], sig[n] );
}

Hey!

You’re on the right track. However, the priors imply huuuuuge standard deviations for the y's.

Running you code with iter=1 results in:

> print(fit, pars = c("mu_alpha", "mu_beta", "sig_beta", "sig_group"))
Inference for Stan model: prior.
1 chains, each with iter=1; warmup=0; thin=1; 
post-warmup draws per chain=1, total post-warmup draws=1.

               mean se_mean sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
mu_alpha[1]    1.52      NA NA   1.52   1.52   1.52   1.52   1.52     0  NaN
mu_alpha[2]   -2.75      NA NA  -2.75  -2.75  -2.75  -2.75  -2.75     0  NaN
mu_beta[1]     0.25      NA NA   0.25   0.25   0.25   0.25   0.25     0  NaN
mu_beta[2]    -6.57      NA NA  -6.57  -6.57  -6.57  -6.57  -6.57     0  NaN
sig_beta[1]    7.91      NA NA   7.91   7.91   7.91   7.91   7.91     0  NaN
sig_beta[2]   22.33      NA NA  22.33  22.33  22.33  22.33  22.33     0  NaN
sig_group[1]   6.98      NA NA   6.98   6.98   6.98   6.98   6.98     0  NaN
sig_group[2] -15.78      NA NA -15.78 -15.78 -15.78 -15.78 -15.78     0  NaN

Samples were drawn using (diag_e) at Mon Nov 25 13:32:21 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Knowing, that sig_beta and sig_group are on the log-scale (!) and model the residual standard deviation (not variance!) of y, you can already see, that there’s going to be problems with these priors.

Let’s look at sig:

> print(fit, pars = c("sig"))
Inference for Stan model: prior.
1 chains, each with iter=1; warmup=0; thin=1; 
post-warmup draws per chain=1, total post-warmup draws=1.

                   mean se_mean sd          2.5%           25%           50%           75%         97.5% n_eff Rhat
sig[1]     1.693891e+44      NA NA  1.693891e+44  1.693891e+44  1.693891e+44  1.693891e+44  1.693891e+44     0  NaN
sig[2]     2.262885e+31      NA NA  2.262885e+31  2.262885e+31  2.262885e+31  2.262885e+31  2.262885e+31     0  NaN
sig[3]    1.297325e+267      NA NA 1.297325e+267 1.297325e+267 1.297325e+267 1.297325e+267 1.297325e+267     0  NaN
sig[4]     5.747142e+53      NA NA  5.747142e+53  5.747142e+53  5.747142e+53  5.747142e+53  5.747142e+53     0  NaN
sig[5]     6.848769e+70      NA NA  6.848769e+70  6.848769e+70  6.848769e+70  6.848769e+70  6.848769e+70     0  NaN
sig[6]    4.797082e+184      NA NA 4.797082e+184 4.797082e+184 4.797082e+184 4.797082e+184 4.797082e+184     0  NaN
sig[7]     3.023011e+18      NA NA  3.023011e+18  3.023011e+18  3.023011e+18  3.023011e+18  3.023011e+18     0  NaN
sig[8]     9.491415e+69      NA NA  9.491415e+69  9.491415e+69  9.491415e+69  9.491415e+69  9.491415e+69     0  NaN
sig[9]    2.200787e+131      NA NA 2.200787e+131 2.200787e+131 2.200787e+131 2.200787e+131 2.200787e+131     0  NaN
sig[10]    3.079005e+24      NA NA  3.079005e+24  3.079005e+24  3.079005e+24  3.079005e+24  3.079005e+24     0  NaN
sig[11]    8.137791e+48      NA NA  8.137791e+48  8.137791e+48  8.137791e+48  8.137791e+48  8.137791e+48     0  NaN
sig[12]    8.347116e+27      NA NA  8.347116e+27  8.347116e+27  8.347116e+27  8.347116e+27  8.347116e+27     0  NaN
sig[13]    1.998118e+98      NA NA  1.998118e+98  1.998118e+98  1.998118e+98  1.998118e+98  1.998118e+98     0  NaN
sig[14]   2.730551e+160      NA NA 2.730551e+160 2.730551e+160 2.730551e+160 2.730551e+160 2.730551e+160     0  NaN
sig[15]   3.116252e+126      NA NA 3.116252e+126 3.116252e+126 3.116252e+126 3.116252e+126 3.116252e+126     0  NaN
sig[16]   4.569669e+281      NA NA 4.569669e+281 4.569669e+281 4.569669e+281 4.569669e+281 4.569669e+281     0  NaN
...
...

… these are huge!

You should probably consider lowering the priors for sig_beta and sig_group way down. Something like:

sig_beta[i] = normal_rng(0, 0.2);
sig_group[i] = normal_rng(0, 0.2);

…or even lower.

Thank you and is the lower =0 necessary in generated quantities? Are upper and lower bounds respected in generated quantities?

vector<lower=0>[N] sig;

I think bounds are respected, but are unnecessary here. sig[n] = exp(...) will surely be positive.

Also, a vector with lower bound is declared like this vector<lower=0>[N] sig;.

Cheers!