Variance model

Hi,

I am able to run latent variable Gaussian process model successfully on several data sets (A or B) separately or their combination (A+B). However, when I change variance model from y\sim normal(f,\sigma) to y\sim normal(f,\sigma_1+\sigma_2*y*(1-y)) I am able to run the model on A but not on B. Is there a good way to find out why?

Thanks for any suggestions.

IMHO, I think that the mean and variance interact so it may be wrong to think of f as if constant between the two cases. Would you mind unpacking f ?

Thanks a lot. What do you mean by unpacking f?

Writing down what it represents :-) I assumed f was a function of other variables.

In this case f\sim multivariate\_normal(0,K(X|\alpha,\rho) where X is design matrix (inputs), K is squared exponential kernel, and \alpha,\rho parameters. I am using Cholesky decomposition.

I’d suggest starting by working out whether the model is divergent, unidentifiable or multimodal .

The first may be flagged by divergent transitions and failure to converge in at least one chain.

The second may the flagged by chains which all converge but on different values for different parameters. Which parameters chain should don’t agree on may be a useful clue.

The third … I’m not sure … in low dimensional problems one can tell by drawing a graph of the parameters log density. I don’t know for when there is a tonne of parameters. It probably looks a lot like case two on the face of it.

Maybe @maxbiostat knows more?

The problem is that model runs for data set A and gives good results. When I run model with data set B or A+B the sampler stops at 0%. I can save warmups to the file but how to figure out from warmups what is going on.

This looks like a recipe for bad geometry. Is there a strong justification for this model?

The warmup interactions should be convergent … HMC allegedly discovers the “typical set” pretty early on so you should be able to see something about the direction in which each chain is going or not going from the warm ups so far as I know.

Is dataset B qualitatively different ? Does it contain zeros or negative numbers or something that isn’t so abundant in A? It may be useful to consider how the parameters interact with the data. Say you half the dataset B; what happens? How small
can you get dataset B and still reproduce the behaviour? If you can get the data to something very small you may be able to think about why it does what it does more easily.

There is a physical justification. The model with constant variance works (with darn good predictions) for A, B, and A+B and with various factor combinations (columns of X). Thus I would attribute this to data set B being incompatible with non constant variance model. I think that cmdstan just hanged on me. I would like to find out why.

I’d like to see more on that, then. Clearly, y is assumed to lie between 0 and 1, otherwise the variance parameter would not make much sense.

Here I’m afraid I can’t help much.

I really don’t know why it would hang. As in no CPU usage, just not doing anything? Hopefully others can be more helpful to you than me on this one.

Looking at warm up helped. The variance model I was trying to implement was \sigma_1-\sigma_2*y*(y-2*y_{max}) - physics tells that variance is lowest at the edges of [0, 1] and highest at y_{max}. Sampler stopped when draw of \sigma_1 became 1e-5. However, the mixing for \sigma_2 is rather good
image

I think the culprit was how I implemented the variance model. Any suggestions how to implement variance “hump”.

Variance has to be positive for the normal distribution. It looks to me possible that it becomes negative in your model?

y and y_{max} is in [0,1]

1 Like

You could try and artificially constrain sigma to be a small distance above zero at least to see if 1e-5 is relevant or not or if it happens anyway. May be useful to share the graph?

I made this function with c>0.
vector standard_deviation(vector sigma, real c, vector y) {
real ymax = 0.5 * ((sigma[2] - sigma[1]) / c + 1);

return sigma[1] + c * y .* (rep_vector(2 * ymax, num_elements(y)) - y);

}

I am getting the same problem with slightly revised variance model (s1 draw goes to 1e-5 and then solver stops):

vector standard_deviation(real s1, real c, vector y) {
  real ymax = 0.5;
        
  return s1 + c * y .* (rep_vector(2 * ymax, num_elements(y)) - y);
}

real lp = normal_lpdf(y | mu, standard_deviation(s1, c, y));

Can I rewrite it as:

lp = normal_lpdf((y - mu) ./ standard_deviation(s1, c, y) | 0, 1)

I would guess there are more hump shape variance models. From physical considerations variance is lowest at the edges and highest in the middle.

Can you perhaps reproduce the problem on some synthetic data and post it all here? I could then tinker with it, I’m out of ideas frankly.

So y is Bernoulli distributed? If you write y * (1-y) then this its variance, but you fitting a normal distribution, where the scale parameter is the sqrt of the variance.

In my 2 cents, the closest that would make sense is the use of a beta distribution.

1 Like