Hi everyone!

I have been dealing with an apparently simple task, but I am stuck with some results I am not confortable with. The task is the following: I have two features (x1 and x2) and a (response) variable z. I am interested in predict de values of z with the known values of x1 and x2, so I am looking some expression that z=f(x1,x2). Let’s have a look at a plot of x1 vs. x2, color-coded by z, whose values range from -0.45 (blue) to +0.45 (red), being z\sim 0 in green:

So you can see there is a kind of gradient of z in the plot. As I don’t have an expression for the function f(x1,x2), y perform a step-by-step regression, starting just with linear terms, that is, x=b_0+b_1x1+b_2x2, and then adding more complex terms and comparing with LOO-CV. I keep adding new terms until the LOO-CV does not improve, so I end up with the expression:

z=b_0+b_1x1+b_2x2+b_3x1^2+b_4x2^2

That is, including the quadratic terms but not the product x1\cdot x2. So my stan model is:

```
model {
vector[N] mu;
mu = x * beta;
y ~ normal(mu, sigma);
beta ~ normal(0., 10.);
sigma ~ cauchy(0., 10.);
}
```

where x is the data matrix and \beta de vector of coefficients. But if I plot the results, I got:

This is the same plot as above, where I have only color coded the points with z=+0.3, 0.0 and -0.3 in red, green and blue, respectively. I have also included the curves z=f(x1,x2) for these three values of z. The green curve reproduces well the points, but for the red case, the curve lies above the points, and the contrary occurs in the blue case, where the curve lies below the corresponding points. So, if I use this fit to predict the values of z, the red points obtain a lower value, and for the blue points the predicted value of z is higher, as you can see in the following plot:

In this plot I represent the true values of z vs. the predicted ones using the fit. The solid line represents the 1:1 relationship and the the dashed one the actual tendency of the points. As you can see, larger (positive) values of z obtain lower predicted values, while lower (negative) values obtain higher predictions. I would expect that the predictions follow the 1:1 relationship.

Does anyone know where is the problem or what should I try? I am open to any ideas, so thank you for your time.