# Read in a 2D array into Stan from Python

I’m generating “fake” experimental data from Python and feeding them into Stan to ensure I can fit to data accurately.

I have n cells with fixed positions. At every run i\in N, 30% of these cells are randomly chosen to be in the active state (index = 1). The rest are off (index = 0). At every run, I calculate a 2-component vector from a model that takes into account the activity status of the cells. At the end of N runs, I store the data in a 2D array of dimension N \times 2. I then pass this to Stan through two vectors, one for each component. There are no problems here. Just remember that each entry in the N iterations is obtained from a different set of activity status index.

For Stan to properly be able to fit to data, I need to tell it the activity status indices used for the generation of each point (point is defined as one entry in the N iterations, which is a vector of (x,y)). To achieve this, I store all the status indices in a 2D array of dimension N \times n in Python during my data generation. Dimension N is because we have N runs, and dimension n is because during each run, we need to assign status indices to all cells. This 2D array is defined like this in Python

for i in range(N):
ind_arr.append([])
for j in range(n):
ind_arr[i].append(cells[j].ind)


In Stan, I simply have real ind_arr[N,n]; in my data block. I put the 2D array from Python in a dictionary and pass it to Stan through the sampling method. My Stan model looks like this

data {
int<lower=0> N;         //number of configurations; each config gives one (bx, by)
int<lower=0> n;   //number of cells in every configuration
vector[n] x;
vector[n] y;
vector[N] bx;
vector[N] by;
real ind_arr[N,n];
}
transformed data {
real x_CM = 0.;
real y_CM = 0.;
vector[n] delta_n;
vector[n] delta_n_x;
vector[n] delta_n_y;

//Compute CM position from position data
for (k in 1:n) {
x_CM += x[k];
y_CM += y[k];
}
x_CM /= n;
y_CM /= n;

//Compute delta's from position data
for(k in 1:n) {
delta_n_x[k] = x[k] - x_CM;
delta_n_y[k] = y[k] - y_CM;
delta_n[k] = sqrt(delta_n_x[k] * delta_n_x[k] + delta_n_y[k] * delta_n_y[k]);
}
}
parameters {
real alpha;
real<lower=0> sigma;
}
model {

//model without WGN
vector[N] bx_temp;
vector[N] by_temp;

//set entries to zero
bx_temp = rep_vector(0.0,N);
by_temp = rep_vector(0.0,N);

//define model without WGN as
for (i in 1:N) {
for (j in 1:n) {
bx_temp[i] += ind_arr[i][j] * (alpha + dot_self(delta_n)) * delta_n_x[j];
by_temp[i] += ind_arr[i][j] * (alpha + dot_self(delta_n)) * delta_n_y[j];
}
}

//add noise and treat as the actual model
bx ~ normal(bx_temp, sigma);
by ~ normal(by_temp, sigma);
}


I get the wrong result for alpha, however, which here is the parameter I’m trying to fit for given my generated data from Python.

Am I doing something wrong with my reading of the 2D array? Or, am I doing something else wrong?

Python uses 0-based indexing, Stan uses 1-based indexing.

Also, to be on a safe side, transform your list of lists to ndarray, so the order is correct

So do this:

ind_arr = np.array(ind_arr) + 1


Thanks for the reply, in particular for reminding me about the 0-based and 1-based indexing. I tried your solution, but I still get the wrong results.

(1) Why should I transform the list into an ndarray, and what do you mean by “order is correct”? I believe the order of status indices placed in the list is correctly associated with the order of 2-component vectors obtained.
(2) The +1 shifts the element values of the ndarray up by 1. That’s not what we want, though, we want to shift the elements up by an index. Right?

Also, shouldn’t the indexing be automatically taken care of? Take for example

Python                                     feeding into Stan
index:     [0, 1, 2, 3]                              [1, 2 , 3, 4]
element:   [10, 20, 30, 40]                          [10, 20, 30, 40]


Does this not happen automatically?

Update
I don’t think the problem is with the 0-based vs 1-based indexing. I printed out the ind_arr from Stan like below

for (i in 1:N) {
print("[");
for (j in 1:N_cells) {
print(m_n_arr[i][j]);
}
print("]");
}


and it perfectly matches the original array printed from Python. The way I’m accessing elements to print here is the same way I access elements when using them in my model block.

I’m thinking the issue could be the following: the calculation I want to do with ind_arr should be done at the beginning of each chain, but the model block gets processed every leapfrog step. Now, I know pretty much nothing about how Stan works internally, so my guess is a wild one.

in which case, it should be done in the transformed data block which is executed exactly once, during model instantiation.