Spline for latent variables

Hi,
In a model I have two latent variables f1, f2, for example

y_{1} \sim gp(f_1, \sigma_1), \quad y_2 \sim gp(f_2, \sigma_1)

Both f1 and f2 are vectors with length n. I want to model the auto-regression between them: f2 = some function of f1+ error.
It is straightforward to run a linear or polynomial regression: f_2= \mathrm{normal} ( \beta f_1, \tau).

Now I want to make it more flexible so I am using spline: f2 = B_spine (f1) + error.

The spine in stan is very slow for this use. I think the reason is that the spine basis function is reconstructed each time for a new value of f1. In contrast, in a spine regression, the basis function is fixed as long as the covariate X is fixed.

Is spine bad for modeling latent parameters? What is the good alternative (to a slow spine or a linear regression) for this purpose?

Thanks.

2 Likes

Slow for a single log density evaluation or slow because there is a difficult dependency in the posterior and thus there are many log density evaluations per iteration?

1 Like

This is my (anecdotal) experience with latent splines. I attempted to employ them in this model instead of the time-series, but the number of log-density evaluations was prohibitive.

1 Like

The difficult posterior in latent variable models is the reason why, for example, INLA and hundreds papers on variational/EP/Laplace inference for GPs exist.

2 Likes

Some update here:

  1. The model I described is slow but eventually fine.
  2. This is the example in which a good adaptation is important. The spline models implies a zero function outside its knots— but we do not really know the support of the latent function f (the mean function of gp). In other words, there is only a very thin region of the joint space, defined by the span of knots, on which the gradient of the spline coefficients are non-zero. If the adaptation fails to find that region, the HMC update will be very slow.
  3. As a results, it is often the case that multiple chains have very different warm-up time (and very slow) but similar sampling time.
  4. It helps A LOT to initialize the whole model by MAP. I think due to some conceptual debate “typical set”, we are reluctant to do so in general. Of course, for a linear regression, the hessian of beta is constant and a bad initialization is less harmful. For spline, the sampler has to enter that thin slice of joint space to have non-zero gradient update. I suspect this MAP initialization can be useful for a general type of latent variable models.
  5. Maybe another way to avoid the vague specification of support of f is to use gp to replace B-splines:
    f_2-f_1= h(f1), h\sim \mathcal{GP}. (f_1 f_2 are separate gp latent functions from the likelihood). But then I have to compute a new cov matrix in each iteration.
1 Like