Scale-mixture of normal vs. student t

jgodwin · February 20, 2019, 2:48pm

Hi all,

I’m attempting to fit a student t_4 model in Stan as a scale mixture of normals. The motivation is that I will do this also in another program, but I created a simpler example/simulation to test in Stan as Stan has the ability to fit a t likelihood. However, I am finding that Stan does not agree with itself when the two ways of fitting the model are compared.

Model 1:

y_{si} \vert \mu, V_s, \kappa_s \sim N(\mu, V_s \times \kappa_s^{-1})
\kappa_s \sim \Gamma(\nu/2, \nu/2)
\mu \sim N(0, 10^5)

where i indexes observation within group s, V_s is a known, fixed scaling parameter, \kappa_s is a group specific precision parameter to be estimated \mu is a common mean to all groups and \nu is the degrees of freedom of the induced t distribution (in this case \nu = 4).

Model 2:

y_{si} \vert \mu, V_s, \nu \sim t_{\nu}(location = \mu, scale = \sqrt{V_s})
\mu \sim N(0, 10^5)

Fitting these two models in Stan give different estimates for \mu. See the attached .png file.

Does anyone know why this might be? I’m attaching a data file and Stan code for the two models.

For what it’s worth, I know the marginal variance in these cases are different (by a factor of \nu/(\nu -2)), but the problem persists when correcting for that.

Thank you!

stanModel_10_student.stan (1.4 KB)
stanModel_10.stan (1.3 KB)

dat.csv (95.7 KB)

ldeschamps · February 20, 2019, 3:44pm

Hello!

I tried with narrower prior sd for mu (yours seems to be quite wide), and different number of iterations (thinking about precision in posterior estimates), but it did not change anything…

Then, I will let wiser statisticians answer :)

Luca

roualdes · February 20, 2019, 6:02pm

Drawing on Michael Betancourt’s Fitting The Cauchy Distribution, I believe you want

y \sim N(\mu, \kappa^{-1/2})
\kappa \sim \Gamma(\nu/2, V/2)

instead of

y \sim N(\mu, V \times \kappa^{-1/2})
\kappa \sim \Gamma(\nu/2, \nu/2)

I edited your stanModel_10.stan to stanModel_10_mix.stan (1.4 KB), from which I got very similar results for the parameter \mu from the student_t version and this mixture version, but with increased effective sample size as hoped for.

jgodwin · February 21, 2019, 1:09am

Thanks for trying! :)

Jessica

jgodwin · February 21, 2019, 10:56am

I think it depends on whether one wants to fit a Cauchy distribution or a t-distribution wth defined moments. It’s not that I simply want a heavy-tailed distribution. The V’s are important in our application and the t-mixture gives us a practical interpretation of the \kappa's. The normal mixture I had written out is the proper way to induce a t_4 via normal mixtures, so I still don’t understand why Stan’s t_4 likelihood would give differing answers.

See the plot below, I still get differing answers with the mixtures vs the Student t distribution. A Cauchy is a t_1 so I have compared it with that.

roualdes · February 22, 2019, 5:06am

You’re right, there seems to be something strange going on here. Mathematically, both your and my mixtures (accounting for my typo above) give Student t distributions. I should have written

\kappa \sim \Gamma(\nu/2, \nu V / 2)

Unfortunately, my typo made me think things were working, but I have now reliably reproduced your results and don’t have any reasonable explanation.

jgodwin · February 22, 2019, 8:39am

It’s very strange! Thanks for giving it a try!

maxbiostat · February 22, 2019, 1:57pm

@betanalpha, this seems up your alley. Any insight?

hhau · February 22, 2019, 2:07pm

Models agree if you put a hierarchal prior structure on the mean:

which is to say you do something like this:

parameters {
  real mu[n];
  real mu_mean;
  real <lower=0> mu_sd;           
  real<lower=0> kappa[n];      // precision for data source
  }
model {
  kappa[1] ~ gamma(2,2 * V[1]);
  kappa[2] ~ gamma(2,2 * V[2]);
  kappa[3] ~ gamma(2,2 * V[3]);
  kappa[4] ~ gamma(2,2 * V[4]);
  kappa[5] ~ gamma(2,2 * V[5]);
  kappa[6] ~ gamma(2,2 * V[6]);
  kappa[7] ~ gamma(2,2 * V[7]);
  kappa[8] ~ gamma(2,2 * V[8]);
  kappa[9] ~ gamma(2,2 * V[9]);
  kappa[10] ~ gamma(2,2 * V[10]);
  // mu ~ normal(0,10000);
  y1 ~ normal(mu[1], sqrt(V[1] * 1/kappa[1]));
  y2 ~ normal(mu[2], sqrt(V[2] * 1/kappa[2]));
  y3 ~ normal(mu[3], sqrt(V[3] * 1/kappa[3]));
  y4 ~ normal(mu[4], sqrt(V[4] * 1/kappa[4]));
  y5 ~ normal(mu[5], sqrt(V[5] * 1/kappa[5]));
  y6 ~ normal(mu[6], sqrt(V[6] * 1/kappa[6]));
  y7 ~ normal(mu[7], sqrt(V[7] * 1/kappa[7]));
  y8 ~ normal(mu[8], sqrt(V[8] * 1/kappa[8]));
  y9 ~ normal(mu[9], sqrt(V[9] * 1/kappa[9]));
  y10 ~ normal(mu[10], sqrt(V[10] * 1/kappa[10]));

  mu ~ normal(mu_mean, mu_sd);
  mu_sd ~ normal(0, 1);
}

and looking at the boxplot of mu_mean. (An aside - use matrices! The code repetition is not ideal).

So there is definitely something strange going on due to some parameters being constant and some varying from data set to data set. I would suggest doing the maths to see why you cannot manipulate your current joint normal/gamma formulation to the desired student-t formulation. (Like the manipulations here: https://stats.stackexchange.com/questions/52906/student-t-as-mixture-of-gaussian, but being careful with the indexing because you have known V[n]).

jgodwin · February 22, 2019, 2:10pm

Thank you! This is great!

I had the non-hierarchy on the mean, because I was trying to replicate how I would implement this in INLA (which I am doing because my more complicated model uses the sparse matrix representation in smoothing effects like ICARs and RWs). But, I am glad to have this special case figured out.

Yes, thank you for the recommendations in my Stan code. This is my first time using Stan so I was not sure about loops, etc. :)

hhau · February 22, 2019, 2:22pm

Something like this is easier to read:

data {
  int <lower = 0> N;
  int <lower = 0> n;
  matrix [n, N] y;
  real <lower = 0> V [n]; 
}

parameters {
  real mu [n];
  real mu_mean;
  real <lower = 0> mu_sd;
  real <lower = 0> kappa [n];
}

model {
  for (ii in 1 : n) {
    y[ii, ] ~ normal(mu[ii], sqrt(V[ii]  * (1 / kappa[ii])));
    kappa[ii] ~ gamma(2, 2 * V[ii]);
  }

  mu ~ normal(mu_mean, mu_sd);
  mu_sd ~ normal(0, 1);
  mu_mean ~ normal(0, 5);
}

and you can reflow the data in the csv you have to the appropriate form pretty quickly:

data_in <- read.csv("dat.csv")
stan_dat <- as.list(data_in)
stan_dat$N <- 500
stan_dat$n <- 10
stan_dat$V <- rep(0.2, 10)
stan_dat[["y"]] <- matrix(
  unlist(stan_dat[5 : length(stan_dat)]),
  nrow = stan_dat$n,
  byrow = T
)

Topic		Replies	Views
Coping with varying curvature Modeling fitting-issues	9	750	August 24, 2019
Mixture model: Different perspective to the schools example Modeling	35	1250	April 2, 2020
Difference in means / t-test and scaling of data Modeling	2	1257	May 25, 2017
Mixture of Normals as distribution for latent variable Modeling specification , mixture	4	548	April 29, 2021
Modeling multi modal distribution Modeling fitting-issues	3	637	March 19, 2023

Scale-mixture of normal vs. student t

Related topics