Sufficient Statistics for Normal Case

Hello,
I do not think I have seen this solved but new to Stan and needed a confirmation:
For a normal hierarchical case say
Y ~ N(theta, sigma^2) with theta unknown and sigma^2 unknown.
We know that the sample mean or Y-bar ~ N(theta, sigma^2 /n) where n is the sample size and

(n-1)*S^2 / sigma^2 ~ Chi-square (degrees of freedom=(n-1)). With S^2 being the sample variance.

With a variable transformation we know under the normal case, S^2 ~ gamma((n-1)/2 , 2*sigma^2/(n-1))
How do you specify this in your model statement using β€œtarget +=” .
Here was my implementation:

model
{

target += normal_lpdf(y_bar|theta, sqrt(sigma2inv(n)));
target += gamma_lpdf(S|n
.5-.5,(n-1)inv(2.0sigma2));

}

Further wanted to ask if I used the inv function correctly here or if there could be an additional function for specifying normal sufficient statistics(I saw a deprecated version). Last but not least, does it matter if you use an array or vector for later calculations of a parameter?
Any help would be appreciated. For full stan code see below:

hb ← ’
data
{
//Number of strata
int<lower=2> K;

vector<lower=0>[K] n_e; //sample sizes
vector[K] y_bar_e; //sample means
vector<lower=0> [K]S_e; // sample variances

//pre-specified constants
real k_0;
real b_01;
real b_02;
real b_03;
real b_04;
real<lower=0> v_0;
}

parameters
{
//level 1
vector[K] theta_ek;
real<lower=0.001> sigma2_e;

//level 2
vector[K] mu_k;
vector[K] tau2_k_i;

//level 3
real mu;
real<lower=0> phi2_i;
real<lower=0> eta2_i;
}

model
{
// priors
eta2_i ~ gamma(b_03, b_04);
phi2_i ~ gamma(b_01, b_02);
mu_k ~ normal(2,sqrt(inv(phi2_i)));
tau2_k_i ~ gamma(v_00.5,v_00.5*inv(eta2_i));

theta_ek ~ normal(mu_k,sqrt(inv(tau2_k_i)));
sigma2_e ~ inv_gamma(.01,.01);

// likelihood
for (i in 1:K)
{
// Sample Means
target += normal_lpdf(y_bar_e[i]|theta_ek[i], sqrt(sigma2_einv(n_e[i])));
//Sample Variances
target += gamma_lpdf(S_e[i]|n_e[i]
.5-.5,(n_e[i]-1)* inv(2.0*sigma2_e));

}
}
’

See this post. And maybe this one too.

1 Like

Thank you Mike! This helped. Do you know why I can not use target += as coded now?
The model runs. Could also just not be a good model, but convergence is the issue and I think it’s because of the target +=.

I’ve re-explored this and can confirm that you can use sufficient statistics for the target += command.
Below script 1 uses the normal likelihood and script 2 uses the normal sufficient statistics. Hope this can clear any doubts and help others better understand the target += command. By running both examples you will see the results agree.
Example taken from: Link.

Script 1

model_normal ← ’
data {
int N;
real x[N];
real mu0;
real sigma0;
real alpha0;
real beta0;
}

parameters {
real mu;
real<lower=0> sigma_2;
}

model {
x ~ normal(mu, sqrt(sigma_2));
mu ~ normal(mu0, sigma0);
sigma_2 ~ inv_gamma(alpha0, beta0);
}

’
mydata ← list(
N = 10,
x=c(91, 85, 72, 87, 71, 77, 88, 94, 84, 92),
mu0 = 75,
sigma0 = 50,
alpha0 = 5,
beta0 = 150
)

fit ← stan(
model_code = model_normal,
data = mydata,
warmup = 1000,
iter = 5000
)

Script 2

model_normal1 ← ’
data {
int N;
real xbar;
real s2;
real mu0;
real sigma0;
real alpha0;
real beta0;
}

parameters {
real mu;
real<lower=0> sigma_2;
}

model {

target += normal_lpdf(xbar|mu, sqrt(sigma_2/(N1.0)));
target += gamma_lpdf(s2|N
.5-.5, N*.5inv(sigma_2)-inv(2sigma_2));

mu ~ normal(mu0, sigma0);
sigma_2 ~ inv_gamma(alpha0, beta0);
}

’

real xbar;
real s2;
mydata ← list(
N = 10,
xbar=mean(c(91, 85, 72, 87, 71, 77, 88, 94, 84, 92)),
s2=var(c(91, 85, 72, 87, 71, 77, 88, 94, 84, 92)),
mu0 = 75,
sigma0 = 50,
alpha0 = 5,
beta0 = 150
)

fit ← stan(
model_code = model_normal1,
data = mydata,
warmup = 1000,
iter = 5000
)

1 Like