Is it possible to run two models in the block model and use the result of one as data to the other?

I aim to propagate uncertainty on a predictor x in a linear model (y=a+bx). I know how to do it by using mean and sd (see for example the Stan manual). However, I was wondering if I could use all the x data, model the mean of x for each group of interest (individuals in the following example) within Stan, and use this mean for the model predicting an outcome y. I wrote some Stan code that compiles, but encounter errors while running on data. I think I have issues with indexingâ€¦ unless this is just not possible.

Here is my unglorious Stan code:

```
data {
int<lower=0> N; // number of observations
vector[N] y; // vector of observations
int<lower=0> N_ind; // number of individuals
array[N] int<lower=1> individual; // index of individuals
int<lower=0> N_x; // number of predictor observations
vector[N_x] x; // predictor observations (multiple obs per individual)
array[N_x] int<lower=0> individual_x; // index of individuals for x (index has to match with individual above)
}
parameters {
vector[N_ind] x_bar;
real<lower=0> sigma_x;
real alpha;
real beta;
real<lower=0> sigma;
}
model {
// model for mean x per individual
x_bar ~ normal(0, 1); // prior
sigma_x ~ exponential(1); // prior
x ~ normal(x_bar[individual_x], sigma_x); // likelihood 1
// model for y
alpha ~ normal(0, 1); // prior
beta ~ normal(0, 1); // prior
sigma ~ exponential(1); // prior
for (i in 1:N) {
y[i] ~ normal(alpha + beta * x_bar[individual[i]], sigma); // likelihood 2
}
}
```

Yes, and indeed this is the ideal approach.

Can you describe your data and the process by which it was collected in a bit more detail?

Well, the code I show above was an attempt for a toy example/theoretical question. But reading that you asked for details on the data and collection process, I now see the kind of rabbit hole I might be about to jump in!

For example, as I work with ecological/environmental data, a typical dataset I could be working on would involve modeling something measured once at multiple locations along a climatic gradient, and building a model to predict this thing with climate variables (for example mean annual temperature, MAT). As I expect the quantity of the measured thing to be the result of the climate of the past 30 years but I did not measure the thing every year for 30 years, I could take the mean of the 30 MAT from the years preceding sampling and use that as a proxy for the average temperature at the locationâ€¦ However, I know climate varies from year to year and I would like to account for this variation as an uncertainty around the mean of MAT. The model I attempted above (intercept only for x) would be the simplest way to do that I could think about. However, it ignores that MAT is actually a time series in this example, so things could quickly become a bit more complex!

This generalization might help: if you can make observations of x multiple times under conditions where the true value for x shouldnâ€™t be different, but the measurements of x are in fact different due to unmodeled phenomena, then including a measurement error model of x and using the parameter reflecting the true x as the regressor for y will be optimal.

If your x measurements are such that each is expected to vary in the true value for x, then youâ€™re out of luck and have to stick with measured x as your regressor for y.

I kind of see what you mean, but the climate variable example bugs me. I expect each yearly MAT to be different, but the x I want to use is the mean of 30 MAT. Since itâ€™s a mean, it has associated uncertainty which may be of interest in the model y=f(x)â€¦ I think about locations with wide ranges of MAT vs others with narrower ranges.

I should correct my generalization: whenever a collection of x measurements can be modelled with a glm that includes an intercept, you can then use that intercept to predict any y that is associated with that collection. This includes:

```
x_obs[,collection_i] ~ normal( intercept , noise );
x_obs[,collection_i] ~ normal( intercept + beta*x_preds[,collection_i], noise );
```

and many other configurations of distribution (so, not just normal) and x-predictive model (many betas, or Gaussian process, etc).

1 Like

Thank you for your insights @mike-lawrence. Do you, or anyone else, have any clue about what may be wrong in my Stan code in the initial post?

I figured out the issue in my Stan code. I was trying to vectorize the submodel for x (â€ślikelihood 1â€ť) but it doesnâ€™t work as x has a different length as yâ€¦

This line:

```
x ~ normal(x_bar[individual_x], sigma_x); // likelihood 1
```

should be:

```
for (j in 1:N_x) {
x[j] ~ normal(x_bar[individual_x[j]], sigma_x); // likelihood 1
}
```

2 Likes