I’m generating “fake” experimental data from Python and feeding them into Stan to ensure I can fit to data accurately.
I have n cells with fixed positions. At every run i\in N, 30% of these cells are randomly chosen to be in the active state (index = 1). The rest are off (index = 0). At every run, I calculate a 2-component vector from a model that takes into account the activity status of the cells. At the end of N runs, I store the data in a 2D array of dimension N \times 2. I then pass this to Stan through two vectors, one for each component. There are no problems here. Just remember that each entry in the N iterations is obtained from a different set of activity status index.
For Stan to properly be able to fit to data, I need to tell it the activity status indices used for the generation of each point (point is defined as one entry in the N iterations, which is a vector of (x,y)). To achieve this, I store all the status indices in a 2D array of dimension N \times n in Python during my data generation. Dimension N is because we have N runs, and dimension n is because during each run, we need to assign status indices to all cells. This 2D array is defined like this in Python
for i in range(N):
ind_arr.append([])
for j in range(n):
ind_arr[i].append(cells[j].ind)
In Stan, I simply have real ind_arr[N,n]; in my data block. I put the 2D array from Python in a dictionary and pass it to Stan through the sampling method. My Stan model looks like this
data {
int<lower=0> N; //number of configurations; each config gives one (bx, by)
int<lower=0> n; //number of cells in every configuration
vector[n] x;
vector[n] y;
vector[N] bx;
vector[N] by;
real ind_arr[N,n];
}
transformed data {
real x_CM = 0.;
real y_CM = 0.;
vector[n] delta_n;
vector[n] delta_n_x;
vector[n] delta_n_y;
//Compute CM position from position data
for (k in 1:n) {
x_CM += x[k];
y_CM += y[k];
}
x_CM /= n;
y_CM /= n;
//Compute delta's from position data
for(k in 1:n) {
delta_n_x[k] = x[k] - x_CM;
delta_n_y[k] = y[k] - y_CM;
delta_n[k] = sqrt(delta_n_x[k] * delta_n_x[k] + delta_n_y[k] * delta_n_y[k]);
}
}
parameters {
real alpha;
real<lower=0> sigma;
}
model {
//model without WGN
vector[N] bx_temp;
vector[N] by_temp;
//set entries to zero
bx_temp = rep_vector(0.0,N);
by_temp = rep_vector(0.0,N);
//define model without WGN as
for (i in 1:N) {
for (j in 1:n) {
bx_temp[i] += ind_arr[i][j] * (alpha + dot_self(delta_n)) * delta_n_x[j];
by_temp[i] += ind_arr[i][j] * (alpha + dot_self(delta_n)) * delta_n_y[j];
}
}
//add noise and treat as the actual model
bx ~ normal(bx_temp, sigma);
by ~ normal(by_temp, sigma);
}
I get the wrong result for alpha, however, which here is the parameter I’m trying to fit for given my generated data from Python.
Am I doing something wrong with my reading of the 2D array? Or, am I doing something else wrong?