Hi,
I am new to Stan and recently tried to make a model in which I tried to fit both the mean and standard deviation for let’s say a linear relationship with y = 2x+5. For each x, I have a mean(y) and a sd(y) in my dataset (but not the original measurements!). My idea was to fit a model by doing a simulation step in “model” which might look like this:

real y_pred[5000];
for (j in 1:5000) {
y_pred[j] = normal_rng(ymean,sigma);
}

Then, my mean(y) would be represented by mean(y_pred) and my sd(y) by sd(y_pred)… However, the random number generators can’t be used in the model block. Is there a way to do this?

what are the unknowns here? what is the data. you need to give us more details.

suppose you have x measured multiple times, and y measured multiple times for each x, but you only have access to mean(y) and sd(y) for each x… the 2 and 5 are unknown, call them a,b

y[i] ~ normal(a*x[i]+b,sdy[i])

if you’re doing something else, then really we need more info.

Sorry for the insufficient information. The problem is slightly more complex, I am actually thinking about data that is modeled via differential equations and parameters that are linked to y in a non-trivial way. Therefore, I would like to implement a Monte Carlo integration step which generates y_pred based on a number of combined random variables. Let’s say it’s a system of linear differential equations with 6 parameters and data (y) is the mean and variance of the area under the first moment curve with respect to two differential equations. No closed-form solution exists and I would therefore use Monte Carlo integration to find the expected value/variance. Is this clearer?

In the end, the question is whether a Monte Carlo integration step can be modeled in the model block, whereas the resulting quantity can be treated as deterministic.

Presumably your problem is that your model looks something like

y ~ some_distribution(E[f(theta)], phi);

or even worse

y ~ some_distribution(f(theta), phi);
y_ave = mean(y);
// only y_ave is stored

The former, where your likelihood explicitly depends on an expectation, cannot be implemented directly in Stan unless you can calculate the integral. If the integral is one-dimensional then you can use the integrate_ode function to compute the integral, or you can wait for a future version of Stan will which have a a tanh-sinh quadrature function. Alternatively you can try to Monte Carlo it as you suggest but it will be extremely slow and it is unlikely that you will get a sufficiently accurate answer (and the gradient will be even worse).

The latter is more tricky because it’s essentially a missing data problem – keeping only a summary statistics loses a huge amount of information. The best approach would be to analytically marginalize over the likelihood to get an exact likelihood for the summary statistic, although this will be hard to do outside of the Gaussian case. Another option is to treat all N data points as new latent parameters and then approximate a likelihood for p(y_summary | y_1, ..., y_N). See the “Measurement Model” section of the manual for some examples.