Mixture Model with joint probability

Hi,

I am very new to stan and I am trying to program a mixture model with joint probability for a clustering/classification of data.
I have two data sets (descriptors), distances and angles, and I am trying read in both and consider both to find classes in my data points.

For now I have a model which uses one data set and calculates classes, but I have no idea how to manage it, to include both datasets into it.

My code:

    data {
int         K;  // number classes
int         N;  // number of data points
vector[N]   y;
}
parameters{
vector <lower=0, upper=16>  [K] means;
vector<lower=0>[K]  sigma;
simplex [K] weights;
}
model{
  for (n in 1:N) {
    vector [K] tmp;
    for (k in 1:K) {
        tmp[k] = log(weights[k]) + normal_lpdf(y[n] | means[k], sigma[k]);
    }
    target += log_sum_exp (tmp);
  }
}

I would appreciate any help!

Looks like you’ll need to make y an array of N 2-vectors like this:

vector[2] y[N];

Your means will be a K-length array of 2-vectors:

vector[2] y[K];

Your sigma will be a K-length array of covariance matrices

matrix[2,2] sigma[K];

Then you can use a the multivariate normal in place of your normal_lpdf, and you will need to estimate a covariance matrix instead of just a s.d… Instead of just jumping into that you might want to code up fitting a simple multivariate normal to some data so you get all the syntax down.
There’s also a nice section in the manual on optimizing the MVN and what priors to use.

Hope that helps.

Krzysztof

1 Like

thank you very much! I am going to try this, hopefully I get it right

So I don’t know if I understand this, just ‘replacing’ the vectors and the normal_lpdf is not sufficient?

   data {
int         K;  // number classes
int         N;  // number of all data points
vector[2] y[N];
}
parameters{
vector[2] means[K];
matrix[2,2] sigma[K];
simplex [K] weights;
}
model{
  for (n in 1:N) {
    vector [K] tmp;
    for (k in 1:K) {
        tmp[k] = log(weights[k]) + multi_normal_lpdf(y[n] | means[k], sigma[k]);
    }
    target += log_sum_exp (tmp);
  }
}

Also, where exactly do I find the section on optimizing the MVN? Is this p 147/148?

You need to declare covariance matrixes as such to make sure they remain positive definite. It’s even better to use Cholesky factors of covariance matrices.

You may have problems here because you don’t have a prior on means or sigma (I’d suggest mu and Sigma here to follow conventional notation; means and covariance would also be consistent).

You also want to compute those log weights only once, so all together, that’d look like:

parameters {
  cholesky_factor_cov[2] L_Sigma[K];
  vector[2] mu[K];
  ...
model {
  vector[K] log_weights = log(weights);
  ...
    tmp[k] = log_weights[k] + multi_normal_lpdf(y[n] | mu[k], L_Sigma[k]);

We like using Cholesky factors of correlation matrices, providing some LKJ prior, then scaling, also with priors.

P.S. Sorry this didn’t get answered sooner—I was missing messages because you have to explicitly tell Discourse to let you know about new messages in topics in which you haven’t participated.