Read in a 2D array into Stan from Python

ptheguy · July 8, 2019, 7:54pm

I’m generating “fake” experimental data from Python and feeding them into Stan to ensure I can fit to data accurately.

I have n cells with fixed positions. At every run i\in N, 30% of these cells are randomly chosen to be in the active state (index = 1). The rest are off (index = 0). At every run, I calculate a 2-component vector from a model that takes into account the activity status of the cells. At the end of N runs, I store the data in a 2D array of dimension N \times 2. I then pass this to Stan through two vectors, one for each component. There are no problems here. Just remember that each entry in the N iterations is obtained from a different set of activity status index.

For Stan to properly be able to fit to data, I need to tell it the activity status indices used for the generation of each point (point is defined as one entry in the N iterations, which is a vector of (x,y)). To achieve this, I store all the status indices in a 2D array of dimension N \times n in Python during my data generation. Dimension N is because we have N runs, and dimension n is because during each run, we need to assign status indices to all cells. This 2D array is defined like this in Python

for i in range(N):
   ind_arr.append([])
      for j in range(n):
         ind_arr[i].append(cells[j].ind)

In Stan, I simply have real ind_arr[N,n]; in my data block. I put the 2D array from Python in a dictionary and pass it to Stan through the sampling method. My Stan model looks like this

data {
   int<lower=0> N;         //number of configurations; each config gives one (bx, by)
   int<lower=0> n;   //number of cells in every configuration
   vector[n] x;
   vector[n] y;
   vector[N] bx;
   vector[N] by;
   real ind_arr[N,n];
}
transformed data {
   real x_CM = 0.;
   real y_CM = 0.;
   vector[n] delta_n;
   vector[n] delta_n_x;
   vector[n] delta_n_y;

   //Compute CM position from position data
   for (k in 1:n) {
      x_CM += x[k];
      y_CM += y[k];
   }
   x_CM /= n;
   y_CM /= n;

   //Compute delta's from position data
   for(k in 1:n) {
      delta_n_x[k] = x[k] - x_CM;
      delta_n_y[k] = y[k] - y_CM;
      delta_n[k] = sqrt(delta_n_x[k] * delta_n_x[k] + delta_n_y[k] * delta_n_y[k]);
   }
}
parameters {
   real alpha;
   real<lower=0> sigma;
}
model {

   //model without WGN
   vector[N] bx_temp;
   vector[N] by_temp;

   //set entries to zero
   bx_temp = rep_vector(0.0,N);
   by_temp = rep_vector(0.0,N);

   //define model without WGN as
   for (i in 1:N) {
      for (j in 1:n) {
         bx_temp[i] += ind_arr[i][j] * (alpha + dot_self(delta_n)) * delta_n_x[j];
         by_temp[i] += ind_arr[i][j] * (alpha + dot_self(delta_n)) * delta_n_y[j];
      }
   }
   
   //add noise and treat as the actual model
   bx ~ normal(bx_temp, sigma);
   by ~ normal(by_temp, sigma);
}

I get the wrong result for alpha, however, which here is the parameter I’m trying to fit for given my generated data from Python.

Am I doing something wrong with my reading of the 2D array? Or, am I doing something else wrong?

ahartikainen · July 8, 2019, 10:35pm

Python uses 0-based indexing, Stan uses 1-based indexing.

Also, to be on a safe side, transform your list of lists to ndarray, so the order is correct

So do this:

ind_arr = np.array(ind_arr) + 1

ptheguy · July 8, 2019, 11:03pm

Thanks for the reply, in particular for reminding me about the 0-based and 1-based indexing. I tried your solution, but I still get the wrong results.

(1) Why should I transform the list into an ndarray, and what do you mean by “order is correct”? I believe the order of status indices placed in the list is correctly associated with the order of 2-component vectors obtained.
(2) The +1 shifts the element values of the ndarray up by 1. That’s not what we want, though, we want to shift the elements up by an index. Right?

Also, shouldn’t the indexing be automatically taken care of? Take for example

Python                                     feeding into Stan            
index:     [0, 1, 2, 3]                              [1, 2 , 3, 4]
element:   [10, 20, 30, 40]                          [10, 20, 30, 40]

Does this not happen automatically?

ptheguy · July 8, 2019, 11:42pm

Update
I don’t think the problem is with the 0-based vs 1-based indexing. I printed out the ind_arr from Stan like below

for (i in 1:N) {
      print("[");
      for (j in 1:N_cells) {
         print(m_n_arr[i][j]);
      }
      print("]");
   }

and it perfectly matches the original array printed from Python. The way I’m accessing elements to print here is the same way I access elements when using them in my model block.

I’m thinking the issue could be the following: the calculation I want to do with ind_arr should be done at the beginning of each chain, but the model block gets processed every leapfrog step. Now, I know pretty much nothing about how Stan works internally, so my guess is a wild one.

mitzimorris · July 9, 2019, 2:51am

in which case, it should be done in the transformed data block which is executed exactly once, during model instantiation.

ahartikainen · July 9, 2019, 5:04am

Ok, I misread the problem.

ptheguy · July 9, 2019, 3:15pm

Problem is fixed. Thank you. Issue was with my model, not the Stan code!

Topic		Replies	Views
StataStan and indexes Other	3	962	April 3, 2018
Indexing an arbitrary subset Modeling techniques , specification	11	913	February 24, 2023
Create parameter constraint index vector inside the transformed data block Modeling specification	5	1606	October 16, 2018
Reading data into arrays of vectors Modeling	4	3482	July 22, 2017
Optimized 2D structure for large dimensions Modeling techniques	2	358	April 2, 2020

Read in a 2D array into Stan from Python

Related topics