Optimized 2D structure for large dimensions

ptheguy · April 2, 2020, 4:35am

I’m feeding my data to Stan in the form of a 2D array of size (N,n) = (330,180000). This structure is coming from N independent examples, each having n active participants. For each example, I go through the active participants–examining their x and y coordinates separately–and perform a weighted sum, with the weight being f(r)=\alpha r^2+\beta. The data from this analysis is stored in the aforementioned 2D array and fed into Stan. I want Stan to infer the parameters \alpha=1.3 and \beta=0.5 and \sigma=0 that were used in the generation of the dataset.

quadratic_model = """
data {
   int N;     // number of independent examples in total (330)
   int n;     // number of active participants in each example -- controls number of dx,dy
   vector[N] bx;
   vector[N] by;
   vector[n] dx[N];        //all x variations within an example
   vector[n] dy[N];        //all y variations within an example
   vector[n] d_sqrd[N];    //distance squared within each example
}
transformed data {
}
parameters {
   real alpha;
   real beta;
   real sigma;
}
model {
   //model without WGN
   vector[N] bx_temp;
   vector[N] by_temp;
   vector[n] f_i;

   for (i in 1:N) {        // looping through examples
      // vectorized form
      f_i = alpha * d_sqrd[i] + beta;
      bx_temp[i] = sum(f_i .* dx[i]);
      by_temp[i] = sum(f_i .* dy[i]);
   }

   bx ~ normal(bx_temp,sigma);
   by ~ normal(by_temp,sigma);
}
generated quantities {
}
"""

The reason I’m inquiring is because currently Stan is chocking on the data and takes forever to complete an iteration! In fact, it doesn’t pass the first iteration within the first hour or so. I have tried cutting down the n dimension to something smaller. The Stan code finishes just fine, but with r_{hat} \sim 10^5.

mike-lawrence · April 2, 2020, 12:36pm

I think you might need to provide a bit more info on the operations you perform on dx in the model.

ptheguy · April 2, 2020, 6:54pm

Sure thing, I updated the original post.

Topic		Replies	Views
Large data sets with stan code Modeling	1	684	September 18, 2018
Increasing Stan efficiency by vectorizing for loop Modeling	6	680	October 9, 2022
Read in a 2D array into Stan from Python Modeling fitting-issues	6	1041	July 9, 2019
Vectorized != loop Modeling performance	12	942	December 29, 2020
Issues with heterogeneously normalized data Modeling techniques , fitting-issues , specification	6	549	April 23, 2020

Optimized 2D structure for large dimensions

Related topics