Statistics on random numbers in model block

Maqs · August 7, 2017, 1:30pm

Hi,
I am new to Stan and recently tried to make a model in which I tried to fit both the mean and standard deviation for let’s say a linear relationship with y = 2x+5. For each x, I have a mean(y) and a sd(y) in my dataset (but not the original measurements!). My idea was to fit a model by doing a simulation step in “model” which might look like this:

real y_pred[5000];
for (j in 1:5000) {
y_pred[j] = normal_rng(ymean,sigma);
}

Then, my mean(y) would be represented by mean(y_pred) and my sd(y) by sd(y_pred)… However, the random number generators can’t be used in the model block. Is there a way to do this?

Thank you in advance
Max

dlakelan · August 7, 2017, 3:53pm

what are the unknowns here? what is the data. you need to give us more details.

suppose you have x measured multiple times, and y measured multiple times for each x, but you only have access to mean(y) and sd(y) for each x… the 2 and 5 are unknown, call them a,b

y[i] ~ normal(a*x[i]+b,sdy[i])

if you’re doing something else, then really we need more info.

Maqs · August 8, 2017, 7:20am

Sorry for the insufficient information. The problem is slightly more complex, I am actually thinking about data that is modeled via differential equations and parameters that are linked to y in a non-trivial way. Therefore, I would like to implement a Monte Carlo integration step which generates y_pred based on a number of combined random variables. Let’s say it’s a system of linear differential equations with 6 parameters and data (y) is the mean and variance of the area under the first moment curve with respect to two differential equations. No closed-form solution exists and I would therefore use Monte Carlo integration to find the expected value/variance. Is this clearer?

In the end, the question is whether a Monte Carlo integration step can be modeled in the model block, whereas the resulting quantity can be treated as deterministic.

betanalpha · August 8, 2017, 12:43pm

Presumably your problem is that your model looks something like

y ~ some_distribution(E[f(theta)], phi);

or even worse

y ~ some_distribution(f(theta), phi);
y_ave = mean(y);
// only y_ave is stored

The former, where your likelihood explicitly depends on an expectation, cannot be implemented directly in Stan unless you can calculate the integral. If the integral is one-dimensional then you can use the integrate_ode function to compute the integral, or you can wait for a future version of Stan will which have a a tanh-sinh quadrature function. Alternatively you can try to Monte Carlo it as you suggest but it will be extremely slow and it is unlikely that you will get a sufficiently accurate answer (and the gradient will be even worse).

The latter is more tricky because it’s essentially a missing data problem – keeping only a summary statistics loses a huge amount of information. The best approach would be to analytically marginalize over the likelihood to get an exact likelihood for the summary statistic, although this will be hard to do outside of the Gaussian case. Another option is to treat all N data points as new latent parameters and then approximate a likelihood for p(y_summary | y_1, ..., y_N). See the “Measurement Model” section of the manual for some examples.

dlakelan · August 8, 2017, 4:44pm

@Maqs

can you do something like create a vector y from transformed parameters,

y = ode_solution(params,inits)

and then, with your data ymean:

ymean ~ normal(mean(y),…)

???

Is this not what you mean?

Topic		Replies	Views
Generate random number in transformed parameters block Modeling specification	2	836	May 9, 2018
Setting response not as a distribution but rather as an equality Modeling	2	251	October 5, 2023
Generating random numbers in the model Modeling	20	5144	July 23, 2021
How to generate Random Number for User defined model Modeling fitting-issues	6	678	June 19, 2022
Specifying the number of samples for rng Modeling	6	2412	November 3, 2017

Statistics on random numbers in model block

Related topics