Why 2(d+1) parameters in ep Stan

Gelman’s paper on EP says there are 2(D+1) shared parameters for a model with D explanatory variables. By looking at sample code m1a.py, apparently the two extra shared parameters are sigma (std dev for y), and sigma_a (std dev for alpha, which is the intercept). I am not sure why then is sigma_b, the std dev for beta, not a shared param. If someone could share some wisdom on this, I would appreciate it.

The inclusion of the intercept term does introduce a huge inconvenience. In my case the model involves two sets of betas (and corresponding predictors), which means the estimate of 2(D + 1) is broken. I now have two intercept terms to introduce and will need to make adjustments to the mu_phi, omega_phi and dphi. Would someone confirm my understanding is correct.

Do you refer to page 15, which mentions 2(D + 1)? That example is logistic regression with not std dev for y
p. 15 in the article

dimensions d = 0, 1, . . . , D, groups j = 1, 2, . . . , J, and samples i = 1, 2, . . . , N_j . The first
element in the vector β_j corresponds to the intercept coefficient and correspondingly the first element
in the data vector x_i,j is set to one. The shared parameters inferred with EP are φ = (μ, log σ).

So 2(D + 1) matches this description.

The number of shared parameters is model dependent, and it should be possible to count the numbers of parameters exactly.

It is likely that if you have a different model than one of the examples, that you need to make adjustments to the code.

Can you tell more about your application?

Thanks, I do see now that the example on paper section 6.1 is not representative of the models in the code. The code proves to be cryptic in certain areas. Does any one know why the data elements such as what’s found in m1a.py

return data(
    X, y, {'sigma_x':sigma_x, 'Sigma_x':Sigma_x}, y_true, Nj, j_lim, 
    j_ind, {'phi':phi_true, 'alpha':alpha_j, 'beta':beta, 

where the 2 dictionaries get loaded in the data class (in common.py) as X_param and true_values never got used by anything except saved to file

        os.path.join(RES_PATH, filename),
        J = J,
        D = D,
        npg = conf.npg,
        seed = conf.seed_data,
        pnames = pnames,
        uncertainty_global = uncertainty_global,
        uncertainty_group = uncertainty_group,
        X_param = data.X_param,

Tuomas who wrote the code will answer you probably tomorrow


The example code and the package in general definitely has some room for improvement in all areas (documentation, structure, user friendlyness, …). I’m actually just doing minor modifications in branch develop available in the repository.

The example in the paper in section 6.1 corresponds to the example model m4b defined in the file m4b.py
The dictionaries you referred to contain the used data generation process parameter values. These are not used in the code but they might be of interest e.g. when evaluating the results.

The X_param contains parameters used to generate the input data X. These affect how hard or easy the problem is. These are analysed and controlled separately in the data generation process. These parameters are not part of the model.

The true_values contain values used in the data generation process for parameters of the model, the shared parameters phi and other arbitrary local parameters.

I hope this helps a bit.

Kind regards,
Tuomas Sivula

1 Like

Thank you Tuomas. This saves me the time to bother coding that part, which I don’t intend to use. And looking at m4b, I now can match up every piece. Definitely, these additional information will help anyone else doing serious work with this repo.

1 Like

I ran into something which seems to be a serious limitation of the current EP setup. The data block in all the sample models are very uniform in the specified input. There seems to be no way to enter additional parameters without making modifications in the method.py code everywhere. Was this issue ever considered? Can Tuomas suggest a straightforward way to introduce additional input parameters? In my case, they are simply constants, which may change between different training data sets, but remains constant once fed into the stan code.

Since we’re revising the paper, I want to check that do you mean “the current code” or is there something unclear also in the paper?

No, this is related to implementation, not the paper. I think my answer lies in the use of the dictionary A

self.data = dict(
    Omega_phi=self.Mat.T,  # Mat transposed in order to get C-order

I will try feed additional input data thru A.

Yes, these A, A_k and A_n can be used to provide additional input for stan.

  • A={‘var’:3} provides var=3 to each site.
  • A_k={‘var’:[3,4]} provides var=3 to the first site and var=4 to the second.
  • A_n={‘var’:Z} slices Z to all the groupsas var=Z[slice_k] similarly as X and y.

If I have time in the future, I would change the interface to more user friendly one.

Tuomass, if I understand the code correctly, the final section where res_f_model.npz is saved is simply for comparison purposes, and isn’t necessary if all you want is the parallel computing results. Would I be right?


another question. in example m4b, I follow the whole model setup, but I am not sure why the use of exp whenever dealing with sigma

`sigma_a <- exp(phi[2]);`

Is there a simple explanation for that?

It’s the usual transformation to constrain sigma_a to be positive and Gaussian approximation for phi[2] is more accurate than for sigma_a.

thank you.