Changing prior from Gaussian to Student's t

Gang · October 22, 2019, 4:54pm

My original model is

y_{ijk} \sim (X_k b+z_{1i}+z_{2j}, \sigma)\\ z_{1i} \sim N(0, \theta_1), z_{2j} \sim N(0, \theta_2)

with the following Stan code

data { 
  int<lower=1> N;  
  vector[N] Y;  
  int<lower=1> K;  
  matrix[N, K] X;  
  int<lower=1> J_1[N];
  int<lower=1> N_1;
  int<lower=1> M_1;
  vector[N] Z_1_1;
  int<lower=1> J_2[N];
  int<lower=1> N_2;
  int<lower=1> M_2;
  vector[N] Z_2_1;
} 
transformed data { 
  int Kc = K - 1; 
  matrix[N, K - 1] Xc;  
  vector[K - 1] means_X;  
  for (i in 2:K) { 
    means_X[i - 1] = mean(X[, i]); 
    Xc[, i - 1] = X[, i] - means_X[i - 1]; 
  } 
} 
parameters { 
  vector[Kc] b;  
  real temp_Intercept;  
  real<lower=0> sigma;  
  vector<lower=0>[M_1] sd_1; 
  vector[N_1] z_1[M_1];  
  vector<lower=0>[M_2] sd_2;  
  vector[N_2] z_2[M_2];  
} 
transformed parameters { 
  vector[N_1] r_1_1 = sd_1[1] * (z_1[1]);
  vector[N_2] r_2_1 = sd_2[1] * (z_2[1]);
} 
model { 
  vector[N] mu = temp_Intercept + Xc * b;
  for (n in 1:N) { 
    mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n];
  } 
  target += student_t_lpdf(temp_Intercept | 3, 0, 10); 
  target += student_t_lpdf(sigma | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10); 
  target += student_t_lpdf(sd_1 | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10); 
  target += normal_lpdf(z_1[1] | 0, 1);
  target += student_t_lpdf(sd_2 | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10); 
  target += normal_lpdf(z_2[1] | 0, 1);
  } 
} 
generated quantities { 
  real b_Intercept = temp_Intercept - dot_product(means_X, b); 
}

Now I would like to change the prior distribution for z_1 from Gaussian to Student’s t(1,0, 10)

y_{ijk} \sim (X_k b+z_{1i}+z_{2j}, \sigma) \\ z_{1i} \sim t(1, 0, 10), z_{2j} \sim N(0, \theta_2)

I have trouble getting the code working. In addition to replacing

  target += student_t_lpdf(sd_1 | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10); 
  target += normal_lpdf(z_1[1] | 0, 1);

with

  target += student_t_lpdf(z_1[1] | 1, 0, 10);

what else should I change? I’m not so sure how to adjust the following

transformed parameters { 
  vector[N_1] r_1_1 = sd_1[1] * (z_1[1]);
  vector[N_2] r_2_1 = sd_2[1] * (z_2[1]);
}

maxbiostat · October 22, 2019, 5:50pm

There are few things I don’t understand with your model. First, how do you make sure z_1 and z_2 are identifiable from the data? Secondly, what exactly is the role of temp_Intercept in your program?

Max_Mantei · October 22, 2019, 6:04pm

Hey! I think Chapter 22.7 of the Stan user guide (scroll down to “Reparameterizing a Student-t Distribution”) is what you are looking for.

A Student-t distribution with \nu=1 is equivalent to a Cauchy distribution, which has really fat tails. So the model might be hard to fit.

Gang · October 22, 2019, 6:05pm

@maxbiostat Thanks for pointing out the confusion. I reformulated the model, and hope it’s clearer now.

maxbiostat · October 22, 2019, 6:11pm

Does @Max_Mantei’s suggestion help?

Gang · October 22, 2019, 7:14pm

Thanks @Max_Mantei and @maxbiostat for the suggestion. Conceptually I get the idea, but I’m still struggling with the implementation. I tried the following, but as a beginner I could not finish it:

data { 
  int<lower=1> N;  
  vector[N] Y;  
  int<lower=1> K;  
  matrix[N, K] X;  
  int<lower=1> J_1[N];
  int<lower=1> N_1;
  int<lower=1> M_1;
  vector[N] Z_1_1;
  int<lower=1> J_2[N];
  int<lower=1> N_2;
  int<lower=1> M_2;
  vector[N] Z_2_1;
} 
transformed data { 
  int Kc = K - 1; 
  matrix[N, K - 1] Xc;  
  vector[K - 1] means_X;  
  for (i in 2:K) { 
    means_X[i - 1] = mean(X[, i]); 
    Xc[, i - 1] = X[, i] - means_X[i - 1]; 
  } 
} 
parameters { 
  vector[Kc] b;  
  real temp_Intercept;  
  real<lower=0> sigma;  
  
  real<lower=0> nu;
  real<lower=0> tau;
  real alpha;

  vector[N_1] z_1[M_1];  
  vector<lower=0>[M_2] sd_2;  
  vector[N_2] z_2[M_2];  
} 
transformed parameters { 
  vector[N_1] r_1_1 = sd_1[1] * (z_1[1]);
  vector[N_2] r_2_1 = sd_2[1] * (z_2[1]);
} 
model { 
  vector[N] mu = temp_Intercept + Xc * b;
  real half_nu;
  half_nu = 0.5 * nu;
  for (n in 1:N) { 
    mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n];
  } 
  target += student_t_lpdf(temp_Intercept | 3, 0, 10); 
  target += student_t_lpdf(sigma | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10); 
  target += student_t_lpdf(sd_1 | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10);

  target += gamma_lpdf(tau[1] | half_nu, half_nu);
  alpha[1] ~ std_normal();
  beta[1] = alpha[1] / sqrt(tau[1]);

  target += student_t_lpdf(sd_2 | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10); 
  target += normal_lpdf(z_2[1] | 0, 1);
  } 
} 
generated quantities { 
  real b_Intercept = temp_Intercept - dot_product(means_X, b); 
}

Could you help me complete it?

@Max_Mantei Thanks for the explanation in the other thread about the AR model, I’ll respond later.

maxbiostat · October 22, 2019, 7:47pm

The problem is that the program is a bit too complex to debug at the moment.

As far as I understand it, you want a somewhat simple multilevel model, with random intercepts z_1 and z_2. And some of the code juggling is to specify a non-centred parametrisation.

I recommend you start your program very simple and then build up to the full model. As it stands I unfortunately do not have time to write a full program to fit the model you have in mind. Maybe @bbbales2 or @stijn have some time free time on their hands to help. If not, I’ll try to help out when I have the chance.

It’d also help a lot if you could provide some sample data so we can test things. It doesn’t need to be real data; simulated data with the same structure will do just fine.

bbbales2 · October 23, 2019, 1:22am

Like @maxbiostat said, looks like you want to use student-t distributions for the random effects? It also looks like you’re using brms.

I don’t know if that’s officially supported, but check this post: https://github.com/paul-buerkner/brms/issues/231#issuecomment-388615502 . I tried it and it still seems to work.

stijn · October 23, 2019, 8:01am

Some random comments.

On the original post: I think the literal problem of replacing the prior for z_1 is not too difficult. Going from z_{1i} \sim N(0, \theta_1) to z_{1i} \sim t(1, 0, 10). You were on the right track in your original post. Notice that you are no longer estimating \theta_1, which is sd_1[1] in the code. You do need to remove it from the parameters block and the transformed parameters block will become.

transformed parameters { 
  vector[N_1] r_1_1 = (z_1[1]);
  vector[N_2] r_2_1 = sd_2[1] * (z_2[1]);
}

This not the most efficient implementation of the Cauchy prior so you could then look into more efficient implementations as you did above. But I would first try to get the simple switch in priors working.
I do think something odd is going on with the code. There should be M_1 vectors z1 of length N_1 but I think the code only uses 1 of those N1 vectors, z_1[1]. There are no priors on z_1[2:M1] and they are not used in constructing the target. That might make sense but it makes me a bit nervous.

Gang · October 23, 2019, 10:21pm

Thanks everyone for your help! Per the suggestion from @bbbales2, I did manage to run the analysis through brms by specifying the grouped effect with (1 | gr(item, dist = "student")). Below is the stan code generated by brms:

data { 
  int<lower=1> N;  // total number of observations 
  vector[N] Y;  // response variable 
  int<lower=1> K;  // number of population-level effects 
  matrix[N, K] X;  // population-level design matrix 
  // data for group-level effects of ID 1
  int<lower=1> J_1[N];
  int<lower=1> N_1;
  int<lower=1> M_1;
  vector[N] Z_1_1;
  // data for group-level effects of ID 2
  int<lower=1> J_2[N];
  int<lower=1> N_2;
  int<lower=1> M_2;
  vector[N] Z_2_1;
  int prior_only;  // should the likelihood be ignored? 
} 
transformed data { 
  int Kc = K - 1; 
  matrix[N, K - 1] Xc;  // centered version of X 
  vector[K - 1] means_X;  // column means of X before centering 
  for (i in 2:K) { 
    means_X[i - 1] = mean(X[, i]); 
    Xc[, i - 1] = X[, i] - means_X[i - 1]; 
  } 
} 
parameters { 
  vector[Kc] b;  // population-level effects 
  real temp_Intercept;  // temporary intercept 
  real<lower=0> sigma;  // residual SD 
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  vector[N_1] z_1[M_1];  // unscaled group-level effects
  vector<lower=0>[M_2] sd_2;  // group-level standard deviations
  vector[N_2] z_2[M_2];  // unscaled group-level effects
  // parameters for student-t distributed group-level effects
  real<lower=1> df_2;
  real<lower=0> udf_2;
} 
transformed parameters { 
  // group-level effects 
  vector[N_1] r_1_1 = sd_1[1] * (z_1[1]);
  // group-level effects 
  vector[N_2] r_2_1 = sqrt(df_2 * udf_2) * sd_2[1] * (z_2[1]);
} 
model { 
  vector[N] mu = temp_Intercept + Xc * b;
  for (n in 1:N) { 
    mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n];
  } 
  // priors including all constants 
  target += student_t_lpdf(temp_Intercept | 3, 0, 10); 
  target += student_t_lpdf(sigma | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10); 
  target += student_t_lpdf(sd_1 | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10); 
  target += normal_lpdf(z_1[1] | 0, 1);
  target += student_t_lpdf(sd_2 | 3, 0, 10)
    - 1 * student_t_lccdf(0 | 3, 0, 10); 
  target += normal_lpdf(z_2[1] | 0, 1);
  target += gamma_lpdf(df_2 | 2, 0.1); 
  target += inv_chi_square_lpdf(udf_2 | df_2);
  // likelihood including all constants 
  if (!prior_only) { 
    target += normal_lpdf(Y | mu, sigma);
  } 
} 
generated quantities { 
  // actual population-level intercept 
  real b_Intercept = temp_Intercept - dot_product(means_X, b); 
}

I meant to use the Student’s t-distribution to handle outliers, but it seems that the Student’s t is still not strong enough to resist the impact of those outliers. Anyway to further strengthen the resistance?

bbbales2 · October 23, 2019, 10:33pm

Is there a reason to think that the outliers should be handled by the random effects and not by the observational model? Or am I thinking about this incorrectly?

Gang · October 24, 2019, 3:13am

Indeed I was barking up the wrong tree! Thanks a lot, everyone, for your kind help!

bbbales2 · October 24, 2019, 2:04pm

Haha well the observational model could be barking up another wrong tree too. Good luck either way! Feel free to ask follow up questions

Topic		Replies	Views
Rejecting Initial Value (Student_t distribution) Modeling	3	408	June 20, 2018
[Re-Post] Rejecting Initial Value (Student_t distribution) Modeling	8	485	June 20, 2018
Beginner request for reparameterization help Modeling	12	1135	January 3, 2018
Help drawing from the posterior after standardizing predictors and outputs Modeling	2	464	April 4, 2018
Slight specification changes resulting in warnings and unexpected results Modeling	10	1068	April 6, 2020

Changing prior from Gaussian to Student's t

Related topics