# Difference in means / t-test and scaling of data

Hi,

I am trying to estimate the difference between two groups. I’ve made the following Stan model,

``````data {
int N1;
real y1[N1];
real y1_mean;

int N2;
real y2[N2];
real y2_mean;
}
parameters {
real<lower=0> mu_tau;
real<lower=0> sigma_tau;

real<lower=0> sigma1;
real<lower=0> sigma2;

real<lower=0> mu_scale1;
real<lower=0> mu_scale2;

real mu1;
real mu2;
}
model {
/* Prior specification */

/* Priors related to mean */
mu_tau ~ normal(0, 1);

mu_scale1 ~ normal(0, mu_tau);
mu_scale2 ~ normal(0, mu_tau);

mu1 ~ normal(y1_mean, mu_scale1);
mu2 ~ normal(y2_mean, mu_scale2);

/* Priors related to scale */
sigma_tau ~ normal(0, 10);

sigma1 ~ normal(0, sigma_tau);
sigma2 ~ normal(0, sigma_tau);

/* Likelihood */
y1 ~ normal(mu1, sigma1);
y2 ~ normal(mu2, sigma2);
}
generated quantities {
real diff_mu;
diff_mu = mu1 - mu2;
}
``````

It’s not the prettiest model, but its working. However, as I understand HMC works best with standardized data, i.e. mean=0 and sd=1. In the data I’m working with the mean is between 50-75, and the standard deviation is somewhere in the range 10-15. Thus, it’s a bit hard now to judge what is a informative/non-informative prior.

Are there any tricks I can use here? Standardizing the data and then de-standardizing it after model fit would not make much sense, I believe, since I actually wish to estimate the mean.

Thanks!

David

Hi!

With regards to the following lines
`mu1 ~ normal(y1_mean, mu_scale1); mu2 ~ normal(y2_mean, mu_scale2);`

Where does `y1_mean` and `y2_mean` come from? If these are the means of `y1` and `y2`, it would be odd if the priors for `mu1,mu2` are centered around these values. Or do they come from another data set?

If you are asking about the so-called “Matt-Trick”, this pertains to reparameterising parameters - not data. See section 26.6 in the Stan user manual. I wouldn’t worry too much about it in your case, as long as the convergence diagnostics are looking good and the estimation is not too slow.

best,
Ole-Petter

1 Like

It actually works best with standardized posteriors; standardizing the predictors can help with this. And it’s not just HMC, the same thing applies to Metropolis and Gibbs.

So you can non-center with:

``````mu_scale1_std ~ normal(0, 1);
mu_scale1 = mu_scale1_std * mu_tau;
``````

You’ll probably have problems with

``````mu1 ~ normal(..., mu_scale1);
``````

because you only have a single `mu1`, which won’t let you estimate a scale.