How to fit and combine submodels into a single stanfit object?

realkrantz · January 15, 2019, 8:45pm

Dear stan community. I would appreciate any help to do this:

(1) for each P fit the model for each column of weights.
(2) do the step 1 for all observations in the dataset to get p * w submodels, where p is the number of P and w is the number of columns of weights.
(3) pool the results by combining the posterior samples of the submodels.

A similar brms implementation is available here, also discussed here. But that allows only incorporation of uncertainty coming from multiple imputation P. I want to account for the uncertainty coming from the estimation of the weights as well.

@paul.buerkner says there that we could also extract those data sets and then pass them to the actual model fitting function as a list of data frames. Using this approach, the solution would probably be to pass my individual p * w data frames, each with p=1 and w=1. Still, it is not clear how the list of these data frames could in practice be passed to the model fitting function.

In my case, the multiply imputted values of the predictors are in long format with each imputation stage represented by a value of the variable P, and the set of weights estimated at each P are in wide format and represented by the variables w_1 to w_10.

There is also the stan approach discussed here, and here. This requires calling “stan by parallel and it returns a list of stanfit objects,” and then use sflist2stanfit to create one stanfit object from the list. Here the problem is that I would need to separate my dataframe into p * w = 50, or more than 100 datasets (if I increase the number of imputations P to more than 10 as generally recommended), each with p=1 and w=1.

Following @BobCarpenter’s recommendation, below I present my attempt to parallelize the likelihood. It seems this approach could allow me to account for one of the sources of uncertainty separately, but not both jointly as intended. Here I am not certain how to specify the transformed parameters block and the likelihood to account for the uncertainty from P too.

Any help to improve and/or correct my current attempt to accomplish my objective would be much appreciated. Any input to implement any of the other approaches discussed above would also be much appreciated. I apologise if you find any basic mistakes in my code. While I am certain about what I want to do and why, the stan implementation of my ideas is still challenging - still learning. This question has also been posted here.

#model

data{
int N;                                     
int P;                 
int ncases[N];         
int A[N];                
int B[N];              
int nn[N];              
int id[N];               
real w_1[N];       
real w_2[N];       
real w_3[N];      
real w_4[N];       
real w_5[N];       
real w_6[N];       
real w_7[N];       
real w_8[N];       
real w_9[N];       
real w_10[N];       
}

parameters {
real beta_0; 
real beta_1; 
real beta_2; 
real beta_3; 
}

transformed parameters {
vector[N] pp_hat;  
vector[N] odds;    
for (i in 1:N) { 
odds[i] = exp(beta_0) * (1 + beta_1*A[i] + beta_2*B[i] + beta_3*A[i]*B[i]);
pp_hat[i] = odds[i]/(1 + odds[i]);
}
}

model {
beta_0 ~ normal(0, 5);
beta_1 ~ normal(0, 5);
beta_2 ~ normal(0, 5);
beta_3 ~ normal(0, 5);
for (i in 1:N){
  target += w_1[i] * binomial_lpmf(ncases[i] | nn[i], pp_hat[i]);
  target += w_2[i] * binomial_lpmf(ncases[i] | nn[i], pp_hat[i]);
  target += w_3[i] * binomial_lpmf(ncases[i] | nn[i], pp_hat[i]);
  target += w_4[i] * binomial_lpmf(ncases[i] | nn[i], pp_hat[i]);
  target += w_5[i] * binomial_lpmf(ncases[i] | nn[i], pp_hat[i]);
  target += w_6[i] * binomial_lpmf(ncases[i] | nn[i], pp_hat[i]);
  target += w_7[i] * binomial_lpmf(ncases[i] | nn[i], pp_hat[i]);
  target += w_8[i] * binomial_lpmf(ncases[i] | nn[i], pp_hat[i]);
  target += w_9[i] * binomial_lpmf(ncases[i] | nn[i], pp_hat[i]);
  target += w_10[i] * binomial_lpmf(ncases[i] | nn[i], pp_hat[i]);
 }
}

Thanks in advance.

jjramsey · January 15, 2019, 9:28pm

I cannot compile your Stan program. Instead, I get the following error:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

dimension declaration requires expression denoting integer; found type=int[]
  error in 'model883edd96636_foo' at line 10, column 8
  -------------------------------------------------
     8: real w[N];                                //weights
     9: int M[N];                             // number of imputations
    10: matrix[M, w] weights;         // matrix containing weights for each imputation
               ^
    11: }
  -------------------------------------------------

Make sure that the Stan program that you post is the one that actually yielded the error message that you posted.

realkrantz · January 15, 2019, 9:29pm

Thanks @jjramsey . Just edited the error message. Any thoughts how to help me accomplish my objective?

jjramsey · January 15, 2019, 9:37pm

The meaning of the error message is straightforward. The declaration matrix[M,w] weights; requires M to be an integer, but you’ve declared M to be an array of integers. Unfortunately, since it’s not clear to me why you did that, there’s only so much I can do to help.

Offhand, I’d suggest reading through the Stan 2.18.2 reference manual so that you actually understand what you are writing.

realkrantz · January 16, 2019, 9:41am

@jjramsey. Just solved the problem of data specification in data block. Any help would be much appreciated.

jjramsey · January 16, 2019, 1:13pm

Again, the error message is straightforward. It’s pointing at A[i,w]; as you can see from the “^” below line 25. The error message says “indexes found=2”, and indeed, A[i,w] has 2 indices. It says “expression dimensions=1”, and indeed, as seen in the data block, A is indeed declared as a 1-dimensional array.

I’m sorry to be so blunt, but the mistake you are making is very basic, and it looks to me like the sort of mistake that would come from someone cobbling together bits and pieces from example code snippets without understanding how they fit together. Again, I recommend reading through the Stan reference manual, especially the section on language, so that you can better understand the programs that you are writing.

jjramsey · January 16, 2019, 7:24pm

All I really can offer at this point is this:

Start with a simplified version of your model, then make sure that works before adding further complexity to the model
Don’t do “voodoo programming”, that is, don’t just copy pieces of code and put them together without knowing what the pieces are doing and how they interact. Make sure you understand your own code.

Sorry I don’t have more to add.

realkrantz · January 17, 2019, 10:49am

Hi all. This question has been updated and also posted here. Any help would be much appreciated.

Topic		Replies	Views
Combining posteriors rstanarm	1	557	February 18, 2019
Add samples to existing brmsfit? brms	2	943	August 21, 2019
How to update a brmsfit object with a modified brms-generated stan model marginalizing over a distribution of weights Modeling techniques , specification	9	2744	December 20, 2021
Efficient fitting of multiple dataset/model combinations Publicity	0	1123	January 28, 2019
Fitting Model -> Predicting Model -> Fitting a 2nd Model Using Predicted Values: All within one iteration? Modeling	0	396	December 20, 2018

How to fit and combine submodels into a single stanfit object?

Related topics