Ep stan separate out normalized std

Bhomass1 · November 20, 2017, 5:44am

Tuomas, there is a common pattern I see through all the ep stan sample models (see m4b for example). The mu and sigma are passed as elements in phi, and therefore estimated using distributed worker nodes. But there is always a global parameter, say eta, which is not incorporated into phi and used in the transformed parameter

alpha <- mu_a + eta * sigma_a;

and eta itself is derived by

eta ~ normal(0, 1);

May I ask why is eta treated as a gobal parameter?

andrjohns · November 21, 2017, 4:27pm

This is an example of the ‘non-centred’ parameterization, which is used to make sampling from some (not all!) posteriors easier and more efficient. There’s a great overview on p.341 of the stan manual (v2.17).

Essentially, in some cases it can be difficult for stan to sample from a normal(mu_a, sigma_a) distribution. Thankfully however, normal(mu_a, sigma_a) is equivalent to mu_a + normal(0, 1) * sigma_a. This means that Stan only needs to sample from a normal(0, 1) distribution (which it’s pretty great at).

For the parameters that you gave they’re equivalent to, but likely more efficient than, alpha ~ normal(mu_a, sigma_a)

Bhomass1 · November 21, 2017, 8:23pm

Ok, I accept that sampling from normal(0, 1) may be easier, but why is mu_a and sigma_a estimated using phi (distributed using EP), and eta is just global?

andrjohns · November 22, 2017, 12:53am

Mu_a and sigma_a are estimated through Phi (multivariate normal) because the two estimates are correlated (not independent). Eta is estimated separately (univariate normal) because each value of eta is assumed to be uncorrelated with all of the others (independent)

Bhomass1 · November 22, 2017, 8:00am

Andr, are you familiar with ep-stan? My question are directly related to how some variables are solved through expectation-maximization in a distributed fashion, while the variable eta is not. I don’t believe your response is along that line.

I especially have a problem visualizing why etb[j] in model m4b is calculated per group j and a single etb is shared among all dimension of D in beta[j]

for (j in 1:J){
    beta[j] <- mu_b + etb[j] .* sigma_b;
}

where beta[j], mu_b, and sigma_b are all of dimension D. The standard dev for each dimension should be completely independent. I fail to see the reasoning for sharing the normalized standard dev.

see https://github.com/gelman/ep-stan/blob/master/experiment/models/m4b.stan

andrjohns · November 22, 2017, 8:51am

Bhomass1:

I especially have a problem visualizing why etb[j] in model m4b is calculated per group j and a single etb is shared among all dimension of D in beta[j]
for (j in 1:J){
    beta[j] <- mu_b + etb[j] .* sigma_b;
}

That’s not quite what’s happening here, since etb[j] is also of dimension D:

vector[D] etb[J];

So there is an independent etb for each dimension D in each group J.

Taking the example from the first post:

alpha <- mu_a + eta * sigma_a;
eta ~ normal(0, 1);

The aim is to estimate variable alpha, which is normally distributed with mean mu_a and std sigma_b. These are the three parameters that need to be returned from each of the K partitions/nodes. The mu_a and sigma_a are able to be sampled easily enough, but sampling a variable with a normal(mu_a, sigma_a) is difficult and so the non-centered parameterization (and eta is used). eta does not need be returned from each of the partitions/nodes, because it was simply used as a tool to construct the variable alpha, which was the variable of interest.

TL;DR From each partition/node want to estimate the distribution of alpha. Rather than directly sampling alpha in each partition, the non-centered parameterization uses eta to construct alpha. For each partition alpha is returned.

Bhomass1 · November 22, 2017, 6:12pm

Thanks, my bad for missing the dimension on etb[j]. That makes much more sense. I still don’t have the intuition why eta and etb do not need to go thru phi. Let me ask this way. in the eqn

alpha <- mu_a + eta * sigma_a;

alpha is already a fixed value, not a random variable, is that right? another word, eta was already sampled from

eta ~ normal(0, 1);

Doesn’t the exact sampling value for eta come from the input evidence involving the complicated likelihood. I am not getting why you can pick any sample from the normalized gaussian and simply plug in to derive alpha.

Upon further inspection, I am more puzzled by the fact that the res_d_model.npz file only gives you the mu_phi and cov_phi, then how do I get the actual values of alpha and beta? These are the parameters that the model is intended to derive.

Can I also ask, if you check the difference between m1a and m4b, the first calculates beta directly from phi, the second goes thru this mu + norrm(0. 1) * sigma calcuation. how do you decide which method to use? is direct derivation of beta good enough?

Bhomass1 · November 23, 2017, 2:27am

after a few rounds of internal debate, I totally get your points. It’s a matter of full posterior vs. a simple maximum a posteriori.

I am leaving my comments in place, so I can revisit and get back to my thought patterns.

Topic		Replies	Views
Difference in parametrization Modeling	2	352	May 3, 2020
Non centered parameterization on variance parameter Modeling	31	8523	October 22, 2018
Non-centered parameterization for likelihood Modeling fitting-issues	2	403	April 25, 2023
Questions about Maximum Likelihood Estimation in Stan Modeling	1	1063	July 31, 2019
Uncentered exponential priors and hyperpriors Modeling prior-choice , hierarchical-model	7	2087	March 24, 2023

Ep stan separate out normalized std

Related topics