Hi all,

I am trying to solve a biological problem with a GP model, where I cannot directly observe the time of the observations. In particular, I have

```
regulator ~ GP(gp_sigma, gp_length)
target = f(regulator) //here f is a function over functions
time_position[i] ~ Beta(shape1, shape2)
regulator_expression[i] ~ N(regulator(time_position[i]), expression_sigma)
target_expression[i] ~ N(target(time_position[i]), expression_sigma)
```

And I can observe only `regulator_expression`

and `target_expression`

, not `time_position`

. I am interested in multiple forms of `f`

, but cannot fit it for any, so I decided to try to work with a simpler model where `f(g)(x) = x`

, i.e. `f`

maps any function to identity, in hopes of learning what’s wrong. Now the problem can be seen as having noisy observations of the GP (`regulator_expression`

) and noisy observations of the associated time (`target_expression`

) - still the model does not converge, even if all the hyperparameters are known. Since if the `time_position`

is observable, the GP alone converges, I assume, the added noise in observation time is responsible for the issues.

Do you have any ideas to try? Or is there a reason to believe that this class of models is beyond the reach of current samplers?

Here is the model code, I have also attached it in a file with an R script that generates data from the model, tries to fit the data with Stan and shows a graphical representation of the fit. Usually most of the chains get close to the true values of `regulator`

over time, but some chains contain nonsense and there are divergent transitions and often the max treedepth is exceeded. I’ve run through a bunch of tweaks and reparametrizations, but nothing worked and I am out of ideas.

Thanks for any help!

The model:

```
data {
int num_cells;
vector[num_cells] regulator_expression;
vector[num_cells] target_expression;
real expression_sigma;
real gp_sigma;
real gp_length;
real time_shape_1;
real time_shape_2;
}
parameters {
real<lower = 0, upper = 1> time_position[num_cells];
vector[num_cells] regulator_true_raw;
}
transformed parameters {
vector[num_cells] regulator_true;
vector[num_cells] target_estimate;
{
//new scope to introduce vars that are not stored in output
matrix[num_cells, num_cells] cov_m = cov_exp_quad(time_position, gp_sigma, gp_length);
matrix[num_cells, num_cells] cov_m_chol;
//Add jitter to ensure positive semidefiniteness
for(i in 1:num_cells) {
cov_m[i,i] = cov_m[i,i] + 1e-6;
}
cov_m_chol = cholesky_decompose(cov_m);
regulator_true = cov_m_chol * regulator_true_raw;
target_estimate = to_vector(time_position);
}
}
model {
regulator_expression ~ normal(regulator_true, expression_sigma);
target_expression ~ normal(target_estimate, expression_sigma);
regulator_true_raw ~ normal(0,1);
time_position ~ beta(time_shape_1, time_shape_2);
}
```

An example model fit - lines are samples from the `regulator`

posterior and points are the true values.

gp_position_test.stan (1.2 KB)

gp_position_test.R (4.5 KB)