# Linear model with generated quantities - ypred issues?

Hello all,

I am still slowly moving along learning PyStan and shifting our models over. I am trying to understand how the generated quantities work. I have a simple linear model of ocean fish catch plotted against ocean net primary production (NPP). Previous work shows that this can be modeled roughly with a simple linear model. I also want to see how well the model does. I added a generated quantities section to the model.

When I run fit.plot([‘y_pred’]) that values look weird to me. They range 4 to 6 while the original data goes from -8 to 2.

The juypter notebook and raw data are here

regress_code = “”"

data {
int<lower=0> N;
int N2;
vector[N] y;
vector[N] x;
}

parameters {
real alpha;
real beta;
real<lower=0> sigma;
}

model {
alpha ~ normal(0,10);
beta ~ cauchy(0,5);
sigma ~ cauchy(0,5);
for (n in 1:N)
y[n] ~ normal(alpha + beta * x[n], sigma);
}

generated quantities {
real y_pred;

y_pred = normal_rng(alpha + beta * x[N2], sigma);

}

“”"

I picked cauchy for beta since in the worse case the slope could be zero but likely it’s some positive number.

Did I set my priors up correctly and ypred? Or should y_pred not be normal_rng? Should I log transform before the model or specify that in the model?

Thank you!
ara

1 Like

I may have made some headway after diving back into the Stan manual and some old Stan forum posts.

looks like you’re using only one value of the covariate `x` by setting `x[N2]` outside a loop.

i’d do something like this:

1 Like

Ok, let me try this and see if I can figure out how all this is working.

Thanks for taking the time!

Ara

@HeltonMaciel Thanks! I think I have it working. If I understand correctly I was only generating one y or a given x. With the for loop I get all the y’s for a given x.

I also went back to just a simple linear model and then built up the model to a generated quantity model.

Ara

Do the `y_pred` not look like the original `y` even if you simulate the original `y` according to the model?

In the raw data y goes from 0.000305 to 9.3 (mean ten year fish catch, tons per km2).

y_pred (mean) only covers part of this range 0.39 to 2.63. The max y_pred covers the upper range but the min y_pred goes negative. I found that weird since you can’t have a negative tonnage.

This leaves me wondering if I didn’t set up the model or priors correctly. Or the original data is just not that great.

My notebook for this is on github here.

Raw data is here.

That’s why I was suggesting simulating data from the model, then fitting it. Then the `y_pred` should look right. It’s the only way to separate model misspecification from implementation issues.

1 Like

Oh, sorry. I misunderstood. Let me try that.

Thanks!
ara

Just to make sure I am doing this right. I hit up the Stan manual 2.16 and on page 160 under Posterior Predictive Checks.

I set up in my model:

generated quantities {
vector[N] y_tilde;
for (n in 1:N)
y_tilde[n] = normal_rng(alpha + beta * x[n], sigma);
}

So I ended up with:

Random data

``````x_tilde = random.sample(range(50, 350), 100)
N_tilde = len(x_tilde)

np.mean(x_tilde, axis=0)
192.27000000000001
``````

Dictionary for Stan

``````Make a dictionary containing all data to be passed to STAN
regress_sim_dat = {'x': x,
'y': y,
'N': N,
'x_tilde':x_tilde,
'N_tilde':N_tilde}
``````

The model, priors, and generated data.

`````` regress_sim = """
data {
int<lower=0> N;
vector[N] y;
vector[N] x;
int<lower=0> N_tilde;
vector[N_tilde] x_tilde;
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
}
model {
alpha ~ normal(0,10);
beta ~ cauchy(0,5);
sigma ~ cauchy(0,5);

for (n in 1:N)
y[n] ~ normal(alpha + beta * x[n], sigma);
}
generated quantities {
vector[N_tilde] y_tilde;

for (n in 1:N_tilde)
y_tilde[n] = normal_rng(alpha + beta * x_tilde[n], sigma);
}
"""
``````

EDIT: Cleaned up the code blocks with ```

Yes, that looks right. To set off code, you want triple back ticks on their own line before and after:

``````