Changing prior from Gaussian to Student's t

My original model is

y_{ijk} \sim (X_k b+z_{1i}+z_{2j}, \sigma)\\ z_{1i} \sim N(0, \theta_1), z_{2j} \sim N(0, \theta_2)

with the following Stan code

data { 
  int<lower=1> N;  
  vector[N] Y;  
  int<lower=1> K;  
  matrix[N, K] X;  
  int<lower=1> J_1[N];
  int<lower=1> N_1;
  int<lower=1> M_1;
  vector[N] Z_1_1;
  int<lower=1> J_2[N];
  int<lower=1> N_2;
  int<lower=1> M_2;
  vector[N] Z_2_1;
} 
transformed data { 
  int Kc = K - 1; 
  matrix[N, K - 1] Xc;  
  vector[K - 1] means_X;  
  for (i in 2:K) { 
    means_X[i - 1] = mean(X[, i]); 
    Xc[, i - 1] = X[, i] - means_X[i - 1]; 
  } 
} 
parameters { 
  vector[Kc] b;  
  real temp_Intercept;  
  real<lower=0> sigma;  
  vector<lower=0>[M_1] sd_1; 
  vector[N_1] z_1[M_1];  
  vector<lower=0>[M_2] sd_2;  
  vector[N_2] z_2[M_2];  
} 
transformed parameters { 
  vector[N_1] r_1_1 = sd_1[1] * (z_1[1]);
  vector[N_2] r_2_1 = sd_2[1] * (z_2[1]);
} 
model { 
  vector[N] mu = temp_Intercept + Xc * b;
  for (n in 1:N) { 
    mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n];
  } 
  target += student_t_lpdf(temp_Intercept | 3, 0, 10); 
  target += student_t_lpdf(sigma | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10); 
  target += student_t_lpdf(sd_1 | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10); 
  target += normal_lpdf(z_1[1] | 0, 1);
  target += student_t_lpdf(sd_2 | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10); 
  target += normal_lpdf(z_2[1] | 0, 1);
  } 
} 
generated quantities { 
  real b_Intercept = temp_Intercept - dot_product(means_X, b); 
} 

Now I would like to change the prior distribution for z_1 from Gaussian to Student’s t(1,0, 10)

y_{ijk} \sim (X_k b+z_{1i}+z_{2j}, \sigma) \\ z_{1i} \sim t(1, 0, 10), z_{2j} \sim N(0, \theta_2)

I have trouble getting the code working. In addition to replacing

  target += student_t_lpdf(sd_1 | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10); 
  target += normal_lpdf(z_1[1] | 0, 1);

with

  target += student_t_lpdf(z_1[1] | 1, 0, 10); 

what else should I change? I’m not so sure how to adjust the following

transformed parameters { 
  vector[N_1] r_1_1 = sd_1[1] * (z_1[1]);
  vector[N_2] r_2_1 = sd_2[1] * (z_2[1]);
} 

There are few things I don’t understand with your model. First, how do you make sure z_1 and z_2 are identifiable from the data? Secondly, what exactly is the role of temp_Intercept in your program?

Hey! I think Chapter 22.7 of the Stan user guide (scroll down to “Reparameterizing a Student-t Distribution”) is what you are looking for.

A Student-t distribution with \nu=1 is equivalent to a Cauchy distribution, which has really fat tails. So the model might be hard to fit.

1 Like

@maxbiostat Thanks for pointing out the confusion. I reformulated the model, and hope it’s clearer now.

Does @Max_Mantei’s suggestion help?

Thanks @Max_Mantei and @maxbiostat for the suggestion. Conceptually I get the idea, but I’m still struggling with the implementation. I tried the following, but as a beginner I could not finish it:

data { 
  int<lower=1> N;  
  vector[N] Y;  
  int<lower=1> K;  
  matrix[N, K] X;  
  int<lower=1> J_1[N];
  int<lower=1> N_1;
  int<lower=1> M_1;
  vector[N] Z_1_1;
  int<lower=1> J_2[N];
  int<lower=1> N_2;
  int<lower=1> M_2;
  vector[N] Z_2_1;
} 
transformed data { 
  int Kc = K - 1; 
  matrix[N, K - 1] Xc;  
  vector[K - 1] means_X;  
  for (i in 2:K) { 
    means_X[i - 1] = mean(X[, i]); 
    Xc[, i - 1] = X[, i] - means_X[i - 1]; 
  } 
} 
parameters { 
  vector[Kc] b;  
  real temp_Intercept;  
  real<lower=0> sigma;  
  
  real<lower=0> nu;
  real<lower=0> tau;
  real alpha;

  vector[N_1] z_1[M_1];  
  vector<lower=0>[M_2] sd_2;  
  vector[N_2] z_2[M_2];  
} 
transformed parameters { 
  vector[N_1] r_1_1 = sd_1[1] * (z_1[1]);
  vector[N_2] r_2_1 = sd_2[1] * (z_2[1]);
} 
model { 
  vector[N] mu = temp_Intercept + Xc * b;
  real half_nu;
  half_nu = 0.5 * nu;
  for (n in 1:N) { 
    mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n];
  } 
  target += student_t_lpdf(temp_Intercept | 3, 0, 10); 
  target += student_t_lpdf(sigma | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10); 
  target += student_t_lpdf(sd_1 | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10);

  target += gamma_lpdf(tau[1] | half_nu, half_nu);
  alpha[1] ~ std_normal();
  beta[1] = alpha[1] / sqrt(tau[1]);

  target += student_t_lpdf(sd_2 | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10); 
  target += normal_lpdf(z_2[1] | 0, 1);
  } 
} 
generated quantities { 
  real b_Intercept = temp_Intercept - dot_product(means_X, b); 
} 

Could you help me complete it?

@Max_Mantei Thanks for the explanation in the other thread about the AR model, I’ll respond later.

The problem is that the program is a bit too complex to debug at the moment.

As far as I understand it, you want a somewhat simple multilevel model, with random intercepts z_1 and z_2. And some of the code juggling is to specify a non-centred parametrisation.

I recommend you start your program very simple and then build up to the full model. As it stands I unfortunately do not have time to write a full program to fit the model you have in mind. Maybe @bbbales2 or @stijn have some time free time on their hands to help. If not, I’ll try to help out when I have the chance.

It’d also help a lot if you could provide some sample data so we can test things. It doesn’t need to be real data; simulated data with the same structure will do just fine.

1 Like

Like @maxbiostat said, looks like you want to use student-t distributions for the random effects? It also looks like you’re using brms.

I don’t know if that’s officially supported, but check this post: https://github.com/paul-buerkner/brms/issues/231#issuecomment-388615502 . I tried it and it still seems to work.

2 Likes

Some random comments.

  • On the original post: I think the literal problem of replacing the prior for z_1 is not too difficult. Going from z_{1i} \sim N(0, \theta_1) to z_{1i} \sim t(1, 0, 10). You were on the right track in your original post. Notice that you are no longer estimating \theta_1, which is sd_1[1] in the code. You do need to remove it from the parameters block and the transformed parameters block will become.
transformed parameters { 
  vector[N_1] r_1_1 = (z_1[1]);
  vector[N_2] r_2_1 = sd_2[1] * (z_2[1]);
} 
  • This not the most efficient implementation of the Cauchy prior so you could then look into more efficient implementations as you did above. But I would first try to get the simple switch in priors working.

  • I do think something odd is going on with the code. There should be M_1 vectors z1 of length N_1 but I think the code only uses 1 of those N1 vectors, z_1[1]. There are no priors on z_1[2:M1] and they are not used in constructing the target. That might make sense but it makes me a bit nervous.

1 Like

Thanks everyone for your help! Per the suggestion from @bbbales2, I did manage to run the analysis through brms by specifying the grouped effect with (1 | gr(item, dist = "student")). Below is the stan code generated by brms:

data { 
  int<lower=1> N;  // total number of observations 
  vector[N] Y;  // response variable 
  int<lower=1> K;  // number of population-level effects 
  matrix[N, K] X;  // population-level design matrix 
  // data for group-level effects of ID 1
  int<lower=1> J_1[N];
  int<lower=1> N_1;
  int<lower=1> M_1;
  vector[N] Z_1_1;
  // data for group-level effects of ID 2
  int<lower=1> J_2[N];
  int<lower=1> N_2;
  int<lower=1> M_2;
  vector[N] Z_2_1;
  int prior_only;  // should the likelihood be ignored? 
} 
transformed data { 
  int Kc = K - 1; 
  matrix[N, K - 1] Xc;  // centered version of X 
  vector[K - 1] means_X;  // column means of X before centering 
  for (i in 2:K) { 
    means_X[i - 1] = mean(X[, i]); 
    Xc[, i - 1] = X[, i] - means_X[i - 1]; 
  } 
} 
parameters { 
  vector[Kc] b;  // population-level effects 
  real temp_Intercept;  // temporary intercept 
  real<lower=0> sigma;  // residual SD 
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  vector[N_1] z_1[M_1];  // unscaled group-level effects
  vector<lower=0>[M_2] sd_2;  // group-level standard deviations
  vector[N_2] z_2[M_2];  // unscaled group-level effects
  // parameters for student-t distributed group-level effects
  real<lower=1> df_2;
  real<lower=0> udf_2;
} 
transformed parameters { 
  // group-level effects 
  vector[N_1] r_1_1 = sd_1[1] * (z_1[1]);
  // group-level effects 
  vector[N_2] r_2_1 = sqrt(df_2 * udf_2) * sd_2[1] * (z_2[1]);
} 
model { 
  vector[N] mu = temp_Intercept + Xc * b;
  for (n in 1:N) { 
    mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n];
  } 
  // priors including all constants 
  target += student_t_lpdf(temp_Intercept | 3, 0, 10); 
  target += student_t_lpdf(sigma | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10); 
  target += student_t_lpdf(sd_1 | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10); 
  target += normal_lpdf(z_1[1] | 0, 1);
  target += student_t_lpdf(sd_2 | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10); 
  target += normal_lpdf(z_2[1] | 0, 1);
  target += gamma_lpdf(df_2 | 2, 0.1); 
  target += inv_chi_square_lpdf(udf_2 | df_2);
  // likelihood including all constants 
  if (!prior_only) { 
    target += normal_lpdf(Y | mu, sigma);
  } 
} 
generated quantities { 
  // actual population-level intercept 
  real b_Intercept = temp_Intercept - dot_product(means_X, b); 
} 

I meant to use the Student’s t-distribution to handle outliers, but it seems that the Student’s t is still not strong enough to resist the impact of those outliers. Anyway to further strengthen the resistance?

1 Like

Is there a reason to think that the outliers should be handled by the random effects and not by the observational model? Or am I thinking about this incorrectly?

4 Likes

Indeed I was barking up the wrong tree! Thanks a lot, everyone, for your kind help!

1 Like

Haha well the observational model could be barking up another wrong tree too. Good luck either way! Feel free to ask follow up questions

1 Like