Using residual plots to improve model specification for latent variable models

Hi everyone,

In traditional regression models you model y ~ f({x}). To move beyond a linear model, you can look at the residuals as a function of different variables, eg y-f(x) vs x_i, and then modify your function f({x}) to get a better fit.

I currently have a multivariate latent variable model which works as follows.
y ~ sum( {z} )
z ~ f({x})
That is, my multivariate z is unobserved. The number of terms in the sum is context dependent but typically around 5, so you are summing z_1 through z_5

How would one go about trying to diagnose poor model fit and thus modify the function f({x})? Standard residual plots are a little challenged… you can’t directly plot y vs x.

Grateful for any help!
Julian

For the first question, have you tried out the generated quantities block?

So if you had some sort of nonlinear regression, you could tack on a generated quantities that makes it easy to see what the fits look like at any parameterization:

data {
  int N;
  vector[N] x;
  vector[N] y;
}
parameters {
  real a;
  real b;
  real c;
  real<lower=0.0> sigma;
}
model {
  y ~ normal(a + b * x + c * x .* x, sigma);
}
generated quantities {
  vector[N] yhat;
  for(n in 1:N)
    yhat[n] = normal_rng(a + b * x[n] + c * x[n] * x[n], sigma);
}

That just generates data that (hopefully) looks like the stuff coming in. You can take those and make plots of them to get a visual on what the fit is doing.

Hope this helps!

Thanks for the fast response. I understand using the generated quantities block for this purpose.

The problem is that you can’t ploy yhat vs individual x values easily because there is a latent function in the middle that obscures what is going on. There are multiple x values for each y.

Julian

Hmmm… maybe one option here is to do weight the underlying values by z.

So construct S_i = sum(z*x_i)/sum(z), and then plot residuals against S. At least this can collapse the multiply underlying values to a single value where there is the most weight given in the function y.

Do you have the x’s as state parameters?

The xs are data points.

You can imagine that the first version of f(x) is just a linear regression, ie f(x_a) = beta * x_a + c.

So example this model might look like:
z_1a = beta_a * x_1a + c_a
z_1b = beta_b * x_1b + c_b
z_1c = beta_c * x_1c + c_c
y_1 = z_1a + z_1b + z_1c
(Now you wouldn’t use a latent variable model for this if this was linear, but my f(x) is actually a non-linear function

All the xs are data variables and pre-specified.

Other constraints:

  • Each x has the same length and corresponding entries represent the same quantities
  • Each beta has the same length and corresponding entries represent the same quantities
  • The sum doesn’t necessarily have the same number of terms. So sometimes there is a sum of 2 latent variables, sometimes 5 or more…

Sorry, I wrote the wrong thing, I meant to ask if the z’s are parameters in your model.

The "z"s are defined as transformed parameters. The parameters in the model are the set of betas. zs are defined functionally based on the betas and xs (there is no distribution/sampling on them)

Whoops, my bad. So it’s plotting stuff in higher dimensions you’re after? The only suggestion I have (and I’m not sure it’s very good) is if your ‘z’ variables happen to be super correlated, then maybe you could try a PCA to get a lower dimensional (2D or 3D) space to plot in? But if your zs are highly correlated like this then you’d probably want to change your model until they aren’t correlated, so this probably isn’t very useful :/.

I don’t understand why y vs x plots aren’t helpful. Just plot all your ys vs various xs, filter the values of x that give you suspiciously large residuals, and give those a special color and look for patterns. You could even filter out values of y that don’t depend on a given x in certain plots. They’re certainly not gonna be as easy to understand as when x is 1D, but should still be informative :D.

Maybe making it more specific will help. (I can’t give away the original problem because it’s commercially sensitive unfortunately… let’s try something simple)

Suppose that the problem is about classes in a school raising money in a school fundraiser.

  • The classes have unequal sizes
  • The dependent variable here (y_i) is the total amount of money raised by each class in a school fundraiser.
  • You observe the amount raised by each class but not the amount raised by the individual students
  • Lets say you have 50 classes in the school for argument’s sake
  • The x variables are the age of each student, their score on the latest maths test and their score on the latest English test. Call x_j the set of x variables for the jth student
  • Define z_j to be the amount raised by the jth student
  • You hypothesise some not-necessarily-linear function that predicts the amount a given student will raise as a function of these x variables, f(z_j | x_j)
  • The amount predicted to be raised by each class is p_i = sum(z_ij), ie the sum of all the contributions from students in class i
  • You assume that your model has normal errors, so that y_i ~ normal(p_i, sigma) for some sigma

Recall that you only observe p_i, not z_i. How do you diagnose whether your hypothesis about the structural form of z_j = f(x_j) is right?

Do you plot (y_i - p_i) as a function of the average value of x_j for a given class? This doesn’t directly tell you about the non-linear relationship. Additionally, E( f ( x_j ) ) is not necessarily f( E( x_j ) ). What you’d really like to do is know what each student raised, and plot that as a function of the x values. But you can’t, because this is unobserved.

One option is that you do something like put in a spline model and just plot what that spline function looks like.

Thoughts?

Edit: For simplicity’s sake you can assume that f(x_j) is g(age).h(maths score).k(English score) or g(age) + g(maths score) + k(English score) for some functions, ie that the problem is separable in the 1D variables

I’ll back out of here now – I’m less familiar with this than other people might be, but if you’re trying to explain the total money raised by a class (y_i s) in terms of features of the students in that class (x_i s), then it still seems to me comparing predicted y vs x with measured y vs x is going to be informative.

How do you diagnose whether your hypothesis about the structural form of z_j = f(x_j) is right?

That’s probably a much harder question than just getting the fit to look right.

Maybe an easier place to start is predicting money donated by a class in terms of the group characteristics of that class (instead of trying to infer stuff per-student)? There’s gotta be a Vignette around here somewhere on stuff like this. Check out the Rstanarm ones. This seems like more of a problem of setting up a big glm in Rstanarm and looking at estimates of the effect coefficients. This was the closest thing I saw (Estimating Regularized Linear Models with rstanarm).

Seems like a clear explanation of the problem though! Thanks for typing that out.

Don’t you observe y[i], not p[i]? Then I don’t see a problem fitting this model, because now there’s no hard constraints on individual students. Then once you have the posterior, you can do inference on the coefficients of the various predictors. Usually we’d evaluate by doing something like cross-validation. Specifically, estimate parameters condtioned on 49 classes, then use posterior predictive inference (i.e., full Bayes) to predict donations from the 50th class and evaluate how good those predictions are. You get estimates of what each student raised, but given that these aren’t observed, there’s much you can do with them to validate at that level.

Sorry @Bob_Carpenter, you’re right that you observe y[i] not p[i]. There’s no issue at all in fitting the model. The question is more around how you can determine what the functional form is.

@bbbales2 - I agree with you that the better approach would be to model the group characteristics directly. Unfortunately the original problem specification doesn’t allow this. Observations are only at the group level but you need individual-level observations, and (to continue the analogy) both the number and attributes of the students differs radically between the classes. Also, the problem is defined in terms of a sum of individual attributes.

Anyway, thanks for taking the time to consider this. I think I’m going to model the individual elements something like a GAM, ie separable in the variables but with splines instead of linear functions.