Why 2(d+1) parameters in ep Stan

Bhomass1 · October 30, 2017, 1:39am

Gelman’s paper on EP says there are 2(D+1) shared parameters for a model with D explanatory variables. By looking at sample code m1a.py, apparently the two extra shared parameters are sigma (std dev for y), and sigma_a (std dev for alpha, which is the intercept). I am not sure why then is sigma_b, the std dev for beta, not a shared param. If someone could share some wisdom on this, I would appreciate it.

The inclusion of the intercept term does introduce a huge inconvenience. In my case the model involves two sets of betas (and corresponding predictors), which means the estimate of 2(D + 1) is broken. I now have two intercept terms to introduce and will need to make adjustments to the mu_phi, omega_phi and dphi. Would someone confirm my understanding is correct.

avehtari · October 30, 2017, 2:57pm

Do you refer to page 15, which mentions 2(D + 1)? That example is logistic regression with not std dev for y
p. 15 in the article

dimensions d = 0, 1, . . . , D, groups j = 1, 2, . . . , J, and samples i = 1, 2, . . . , N_j . The first
element in the vector β_j corresponds to the intercept coefficient and correspondingly the first element
in the data vector x_i,j is set to one. The shared parameters inferred with EP are φ = (μ, log σ).

So 2(D + 1) matches this description.

The number of shared parameters is model dependent, and it should be possible to count the numbers of parameters exactly.

It is likely that if you have a different model than one of the examples, that you need to make adjustments to the code.

Can you tell more about your application?

Bhomass1 · October 31, 2017, 6:10am

Thanks, I do see now that the example on paper section 6.1 is not representative of the models in the code. The code proves to be cryptic in certain areas. Does any one know why the data elements such as what’s found in m1a.py

return data(
    X, y, {'sigma_x':sigma_x, 'Sigma_x':Sigma_x}, y_true, Nj, j_lim, 
    j_ind, {'phi':phi_true, 'alpha':alpha_j, 'beta':beta, 
    'sigma':sigma}      
)

where the 2 dictionaries get loaded in the data class (in common.py) as X_param and true_values never got used by anything except saved to file

   np.savez(
        os.path.join(RES_PATH, filename),
        J = J,
        D = D,
        npg = conf.npg,
        seed = conf.seed_data,
        pnames = pnames,
        uncertainty_global = uncertainty_global,
        uncertainty_group = uncertainty_group,
        X_param = data.X_param,
        **data.true_values
    )

avehtari · October 31, 2017, 10:00pm

Tuomas who wrote the code will answer you probably tomorrow

tuomassivula · November 1, 2017, 3:18pm

Hello,

The example code and the package in general definitely has some room for improvement in all areas (documentation, structure, user friendlyness, …). I’m actually just doing minor modifications in branch develop available in the repository.

The example in the paper in section 6.1 corresponds to the example model m4b defined in the file m4b.py
The dictionaries you referred to contain the used data generation process parameter values. These are not used in the code but they might be of interest e.g. when evaluating the results.

The X_param contains parameters used to generate the input data X. These affect how hard or easy the problem is. These are analysed and controlled separately in the data generation process. These parameters are not part of the model.

The true_values contain values used in the data generation process for parameters of the model, the shared parameters phi and other arbitrary local parameters.

I hope this helps a bit.

Kind regards,
Tuomas Sivula

Bhomass1 · November 3, 2017, 7:01am

Thank you Tuomas. This saves me the time to bother coding that part, which I don’t intend to use. And looking at m4b, I now can match up every piece. Definitely, these additional information will help anyone else doing serious work with this repo.

Bhomass1 · November 6, 2017, 2:07am

I ran into something which seems to be a serious limitation of the current EP setup. The data block in all the sample models are very uniform in the specified input. There seems to be no way to enter additional parameters without making modifications in the method.py code everywhere. Was this issue ever considered? Can Tuomas suggest a straightforward way to introduce additional input parameters? In my case, they are simply constants, which may change between different training data sets, but remains constant once fed into the stan code.

avehtari · November 6, 2017, 7:09pm

Since we’re revising the paper, I want to check that do you mean “the current code” or is there something unclear also in the paper?

Bhomass1 · November 6, 2017, 8:00pm

No, this is related to implementation, not the paper. I think my answer lies in the use of the dictionary A

self.data = dict(
    N=X.shape[0],
    X=X,
    y=y,
    mu_phi=self.vec,
    Omega_phi=self.Mat.T,  # Mat transposed in order to get C-order
    **A
)

I will try feed additional input data thru A.

tuomassivula · November 7, 2017, 5:49am

Yes, these A, A_k and A_n can be used to provide additional input for stan.

A={‘var’:3} provides var=3 to each site.
A_k={‘var’:[3,4]} provides var=3 to the first site and var=4 to the second.
A_n={‘var’:Z} slices Z to all the groupsas var=Z[slice_k] similarly as X and y.

If I have time in the future, I would change the interface to more user friendly one.

Bhomass1 · November 12, 2017, 2:11am

Tuomass, if I understand the code correctly, the final section where res_f_model.npz is saved is simply for comparison purposes, and isn’t necessary if all you want is the parallel computing results. Would I be right?

tuomassivula · November 14, 2017, 8:16am

Yes

Bhomass1 · November 20, 2017, 3:04am

another question. in example m4b, I follow the whole model setup, but I am not sure why the use of exp whenever dealing with sigma

`sigma_a <- exp(phi[2]);`

Is there a simple explanation for that?

avehtari · November 20, 2017, 10:05pm

It’s the usual transformation to constrain sigma_a to be positive and Gaussian approximation for phi[2] is more accurate than for sigma_a.

Bhomass1 · November 20, 2017, 11:08pm

thank you.

Topic		Replies	Views
Identification issue in Heckman model with correlated individual-specific intercepts Modeling fitting-issues , specification	1	364	August 10, 2022
Recovering parameter Modeling	5	829	September 27, 2019
Running time problem Modeling	3	667	June 7, 2017
Non-centered parameterization for multivariate model with heterogeneity Modeling rstan , specification	4	473	August 21, 2020
Help with misspecified model Modeling fitting-issues , specification	2	380	September 10, 2020

Why 2(d+1) parameters in ep Stan

Related Topics