Multilevel Gaussian mixture regression model


#1

Help needed in modelling

Implementing a multilevel model as follows:

Level 1 :

y = intercept + x * slope + error

Level 2 :

slope represents Gaussian Mixture Model

=> GMM(slope | mean, S.D)

data {
    int<lower=1> N; // number of subjeccts (assumed numbered 1 to N)
    int<lower=1> D; // number of data points
    real<lower=0,upper=100> y[D];  // Feature
    real<lower=0> time[D];        // timing of visits
    int<lower=1,upper=N> sub[D];  // ID of subjects
    int K; // Number of gaussians
}

parameters {

    // Level 1 Parameters for Linear Model.

    real a[N];              // intercept, varies by subject
    real b[N];              // slope over time, varies by subject
    real<lower=0> err;      // error of the regression

    // Level 2 Parameters for Gaussian Mixture model.

    ordered[K] mu;
    real<lower=0, upper=1> theta;
    real<lower=0> sigma[K];


}

transformed parameters {
    real y_hat[D];
    for (i in 1:D) {
        y_hat[i] = a[sub[i]] + b[sub[i]] * time[i];
    }
}

model {
    // the hyper-model
    for (j in 1:N) {
        a[j] ~ normal(0,1);
        b[j] ~ normal(0,1);
    }




    // the base level of the model
    for (i in 1:D) {
        y[i] ~ normal(y_hat[i],err);
    }


    sigma ~ normal(0, 1);
    mu ~ normal(0, 1);
    theta ~ beta(5, 5);

   for (n in 1:N){
            target += log_mix(theta,
                     normal_lpdf(b[n] | mu[1], sigma[1]),
                     normal_lpdf(b[n] | mu[2], sigma[2]));

       }


}
  1. Would like to know whether the model is correct.
  2. Experiencing Low E-BFMI.

#2

One thing I’ve noticed is that b influences the density twice:

b[j] ~ normal(0,1);

and

           target += log_mix(theta,
                     normal_lpdf(b[n] | mu[1], sigma[1]),
                     normal_lpdf(b[n] | mu[2], sigma[2]));

I.e. you are saying the b should belong to two distributions at the same time. This could easily lead to the problems you describe. You probably want to remove the first one.

Also not that the sampling statements are vectorized, so you can have:

a ~ normal(0,1);

y ~ normal(y_hat,err);

avoiding the for loops and slightly increasing performance.