Issues with heterogeneously normalized data

The model:
Consider the vector \vec{v}=(v_x,v_y), which has the normalized form of \hat{v}=\frac{1}{\sqrt{v_x^2+v_y^2}}(v_x,v_y). Now, consider a situation in which you have N=300 of such vectors, with the i^{th} vector computed in the following way:

v_x^{(i)}=\mathcal{N}\left(\alpha C^{(i,x)}_{alpha}+\beta C^{(i,x)}_{beta},\sigma \right) \tag{1}
v_y^{(i)}=\mathcal{N}\left(\alpha C^{(i,y)}_{alpha}+\beta C^{(i,y)}_{beta},\sigma \right), \tag{2}

where C^{(i,x)}_{alpha}, C^{(i,x)}_{beta}, C^{(i,y)}_{alpha}, and C^{(i,y)}_{beta} are pre-computed coefficients that depend on the component of the vector as well as the vector we are considering–hence the i. Moreover, \alpha, \beta, and \sigma are the two internal parameters I wish to infer from Stan.

To recap, the Stan model will take in all the C coefficients above as four distinct vectors of size N, as well as the actual data v_x and v_y in the form of vectors of size N. Its job is to infer the parameters \alpha, \beta and \sigma. For testing purposes, I am feeding Stan data generated with \alpha=0.3, \beta=1.5, and \sigma=0.1.

The issue:
When I feed Stan data generated precisely according to Eqs. (1) and (2), everything runs perfectly fine. Stan converges beautifully on the set internal and noise parameters. When tackling the real problem, however, I will be dealing with normalized vectors coming in as input data. In my test scenario, this would translate to using Eqs. (1) and (2), computing \hat{v}^{(i)} from it, and feed that to Stan. In this case, Stan needs to infer parameters from heterogeneously normalized data–noting that each vector has a different normalization factor.

I’ve tried two different approaches here in my Stan code:

  1. Compute the mean values in Stan from C coefficients according to Eqs. (1) and (2), then compute the magnitude, and normalize them. Then, apply the normal distribution on it. I have ensured that the incoming data too will have its noise added after normalization. This approach results in almost all the iterations raising the message the Metropolis proposal is getting rejected because in normal_lpdf, the scale parameters is negative, instead of >0. I know what this means, but I cannot trace how or why I’m causing it.

  2. Introduce a third parameter–the normalization parameter–which will be a vector of size N. The idea here is that Stan will compute the non-normalized vectors, and in trying to infer \alpha and \beta, it will also infer the appropriate normalization for that vector. To my surprise, this method produces garbage, although it runs just fine with \mathcal{O}(r_{hat}) \sim 1.

I seek:

  1. To understand how and why my modeling is causing the issues discussed in Points #1 and #2.
  2. How can I reparametrize the model to be able to carry out this inference on heterogeneously normalized data?

My code:
If you end up running things, Point #1 code should be commented out when running Point #2, and vise versa.

data {
   int N;     // number of examples in total (300)
   vector[N] vx;
   vector[N] vy;
   vector[N] alpha_coeff_x;
   vector[N] beta_coeff_x;
   vector[N] alpha_coeff_y;
   vector[N] beta_coeff_y;
}
transformed data {
}
parameters {
   real alpha;
   real beta;
   real norms[N];     // used in Point #2
   real sigma;
}
model {
   //true normalized values
   vector[N] vx_true;
   vector[N] vy_true;
   real norm = 0.;    // used in Point #1

   //Point #1: skip Point #2
   for (i in 1:N) {    
      vx_true[i] = alpha * alpha_coeff_x[i] + beta * beta_coeff_x[i];
      vy_true[i] = alpha * alpha_coeff_y[i] + beta * beta_coeff_y[i];
      norm = sqrt(vx_true[i]*vx_true[i] + vy_true[i]*vy_true[i]);
      vx_true[i] /= norm;
      vy_true[i] /= norm;
   }

   //Point #2: skip Point #1
   for (i in 1:N) {    
      vx_true[i] = norms[i]*(alpha * alpha_coeff_x[i] + beta * beta_coeff_x[i]);
      vy_true[i] = norms[i]*(alpha * alpha_coeff_y[i] + beta * beta_coeff_y[i]);
   }

   vx ~ normal(vx_true,sigma);
   vy ~ normal(vy_true,sigma);
}
generated quantities {
}
real <lower=0> sigma;

is necessary to avoid

If I understand it correctly (\hat{v_x}, \hat{v_y}) are the observed, normalised vectors while the generative model are equations (1) and (2). Given that (\hat{v_x}, \hat{v_y}) = \frac{(v_x, v_y)}{\sqrt{v^2_x + v^2_y}}. The difficulty than is to write equation 1 and 2 in terms of (\hat{v_x}, \hat{v_y}) .

However, I think you will run into a problem because the normalisation erases information (if I understand the problem correctly). After normalisation, \hat{v_x}^2 = 1 - \hat{v_y}^2, so for each vector, you only have 1 observation instead of 2. So, I think the best approach would be to find an expression for v_x as a function of the parameters and the coefficients but I have not really gotten anywhere with that.

1 Like

I agree that normalization is erasing direction-information, but Stan will need to rely on vector[N] vy input still to get the directions right, no?

Additionally, I tested the following: based on the idea that normalization erases information, I completely commented out all lines of code related to vy and only considered vx as input. This naturally means choosing code for Point #2 above. To my surprise, Stan was NOT able to fit if input vx was normalized, but it did just fine for vx not normalized. In this case, Stan has no notion of vy so it shouldn’t suffer from information having been erased.

Looking at how the raw data is affected when normalizing, I think the issue is again with the fact that each data point gets divided by a different value so the linearity relationship that comes out of Eqs. (1) and (2) gets destroyed upon normalization. The following is coming from generative modeling:

Figure_1 Figure_2

You are right the normalised, \hat{v_x} is still affected by C^{(y)}_{\alpha} and C^{(y)}_{\alpha}. Sorry if I gave the impression that this was not the case.

I think one possibility might be to model \frac{\hat{v_x}}{\hat{v_y}} = \frac{v_x}{v_y} (This works because the normalisation factor cancels out). This is the ratio of two Normal distributions. Since the scaling in both is \sigma, you can divide both sides by \sigma which also means that you cannot estimate sigma (Edit: That’s wrong \sigma shows up in the location parameter after dividing). If the C^x's are independent from the C^y's than the two normals are independent. I know that the ratio of two standard independent normals is a Cauchy distribution however, I could not immediately figure out how to proceed with the means in the equations (1) and (2). I hope it at least helps you to find a better solution.

Thank you for the response. I guess the thing I don’t understand is this: If the issue is the information erased due to normalization, then why do I still have problems when Stan only knows about vx and I tell it to only infer on vx normalized data? In this case there is no notion of vy and the values of vx should just be treated as values with no other information anywhere.

Because vx is a function of alpha_coef_y and beta_coef_y in the generative model (through the normalisation). I assume that you tried to recover vx from alpha_coef_x and beta_coef_x. Another way of looking at it is that in equation (1) and (2), the true values are independent from each other, while after normalisation the observed values are dependent on each other which means that you need to explicitly model that dependency. So maybe, it’s not exactly correct to say that the normalisation is just erasing information, it also changes the relation between observables.

DUHH! Yes, you’re absolutely right. Thank you. Even after I explicitly write the normalization in Eqs. (1) and (2) in terms C_x's and C_y's, Stan isn’t able to infer properly.