# Problematic posterior fit for 2-component Gaussian mixture model

Dear Stan users:

I am currently fitting a single component and a 2-component Gaussian model to a real data set and using WAIC to select the preferred model.

I simulated a 2-component Gaussian model (having the data structure exactly like my real data to be modelled) and fit both correct 2-component and incorrect 1-component model using a customised-constructed loglikelihood function and my model is able to pick 2-component as having a low WAIC value and also both models converged (no divergent warnings, Rhat=1).

However, when I fit my model to the real data set ( which is suspected to have 2 components), I cannot get my model converged not matter how long I ran.

Actually, my customised likelihood function model is only to fit on group-level summary statistics of the data set (using 5-order summary stats) rather than all of the underlying data points and to verify I also just fit the same model using all of my classical observations and I can get a nice result and WAIC (for fitting all of the classical data points rather than just use 5-order statistics) also picked the 2-component model to be superior than the 1-component model.

I do not understand why is this the case. Can anyone please provide some suggestions as how I can figure out where it goes wrong? ( in the sense that why it works for the simulated data but it does not work for the real data and I cannot get the model fitted on the real data to converge)

Below is my model code for the customised-constructed likelihood function where it uses only 5-order summary statics for a 3-hour block rather than using all observations within this 3-hour block

``````//For QUT data with 3-hour block interacting with day of the week and day of the year effects
//Fitting with the whole 17+24 days of data
functions{

int r_in(int pos,int[] pos_var) {
int pos_match;
int all_matches[size(pos_var)];

for (p in 1:(size(pos_var))) {
all_matches[p] = (pos_var[p]==pos);
}

if(sum(all_matches)>0) {
pos_match = 1;
return pos_match;
} else {
pos_match = 0;
return pos_match;
}

}

real varying_lpdf(real [] y, int N, int[] kk, int[] index1, int[] index2, int[] index_11, int[] index_12, int[] index_18, int[] index_25, int[] index_26, int[] hour_index, int Ncol, int Ncol_complete,
int index_length, matrix ordered_effects_gen, real sigma, vector lambda_gen){

vector[N] cdf;
vector[N] sum_pdf;
vector[index_length] diff;
vector[Ncol] log_prob;
vector[Ncol] col_sum_pdf;

log_prob=rep_vector( 0, Ncol);
col_sum_pdf = rep_vector(0, Ncol);

for(n in 1:N){
cdf[n] =  (1-lambda_gen[kk[n]]) * normal_cdf(y[n], ordered_effects_gen[kk[n],1], sigma) +lambda_gen[kk[n]] *normal_cdf(y[n],ordered_effects_gen[kk[n],2], sigma) ;
sum_pdf[n] =   log_mix((1-lambda_gen[kk[n]]), normal_lpdf(y[n]| ordered_effects_gen[kk[n],1], sigma), normal_lpdf(y[n]| ordered_effects_gen[kk[n],2], sigma));

}

for(n in 1:index_length){

diff[n] = (cdf[index2[n]] - cdf[index1[n]]);

}

for(n in 1:Ncol_complete){

col_sum_pdf[hour_index[n]] =  sum_pdf[5*n-4] + sum_pdf[5*n-3] +sum_pdf[5*n-1] +sum_pdf[5*n-1]+sum_pdf[5*n];

//for 3-hour blocks where there are 18 obs
if(r_in(hour_index[n], index_18)){
log_prob[hour_index[n]] = 5 * log( diff[4*n-3] ) + 2*log(diff[4*n-2]) + log(diff[4*n-1]) + 5* log(diff[4*n]);
}

if(r_in(hour_index[n], index_25)){
log_prob[hour_index[n]] = 5 * log( diff[4*n-3] ) + 5*log(diff[4*n-2]) + 5* log(diff[4*n-1]) + 5* log(diff[4*n]);
}

if(r_in(hour_index[n], index_26)){
log_prob[hour_index[n]] = 5 * log( diff[4*n-3] ) + 2*log(diff[4*n-2]) + log(diff[4*n-1]) + 6* log(diff[4*n]);
}

if(r_in(hour_index[n], index_11)){
log_prob[hour_index[n]] = 2* log( diff[4*n-3] ) + 2*log(diff[4*n-2]) + log(diff[4*n-1]) + log(diff[4*n]);
}

if(r_in(hour_index[n], index_12)){
log_prob[hour_index[n]] = 2 * log( diff[4*n-3] ) + 2*log(diff[4*n-2]) + log(diff[4*n-1]) + 2* log(diff[4*n]);
}

}

return(sum(log_prob)+sum(col_sum_pdf));
}

vector log_lik_vec_gen( int Ncol, int Ncol_complete, int[] index_11, int[] index_12, int[] index_18, int[] index_25, int[] index_26, int[] hour_index, vector diff_gen, vector sum_pdf_gen ){
vector[Ncol] logg_prob1;
vector[Ncol] col_sum_pdf_gen;

logg_prob1=rep_vector( 0, Ncol);
col_sum_pdf_gen = rep_vector( 0, Ncol);

for(n in 1:Ncol_complete){
col_sum_pdf_gen[hour_index[n]] =  sum_pdf_gen[5*n-4] + sum_pdf_gen[5*n-3] +sum_pdf_gen[5*n-1] +sum_pdf_gen[5*n-1]+sum_pdf_gen[5*n];

if(r_in(hour_index[n], index_18)){
logg_prob1[hour_index[n]] = 5 * log( diff_gen[4*n-3] ) + 2*log(diff_gen[4*n-2]) + log(diff_gen[4*n-1]) + 5* log(diff_gen[4*n]);
}

if(r_in(hour_index[n], index_25)){
logg_prob1[hour_index[n]] = 5 * log( diff_gen[4*n-3] ) + 5*log(diff_gen[4*n-2]) + 5* log(diff_gen[4*n-1]) + 5* log(diff_gen[4*n]);
}

if(r_in(hour_index[n], index_26)){
logg_prob1[hour_index[n]] = 5 * log( diff_gen[4*n-3] ) + 2*log(diff_gen[4*n-2]) + log(diff_gen[4*n-1]) + 6* log(diff_gen[4*n]);
}

if(r_in(hour_index[n], index_11)){
logg_prob1[hour_index[n]] = 2* log( diff_gen[4*n-3] ) + 2*log(diff_gen[4*n-2]) + log(diff_gen[4*n-1]) + log(diff_gen[4*n]);
}

if(r_in(hour_index[n], index_12)){
logg_prob1[hour_index[n]] = 2 * log( diff_gen[4*n-3] ) + 2*log(diff_gen[4*n-2]) + log(diff_gen[4*n-1]) + 2* log(diff_gen[4*n]);
}

}

return(logg_prob1+col_sum_pdf_gen);
}
}

data {
int<lower=1> N; //==1660 fully observed observations

int<lower=1> Ncol; //==352 total 3-hour blocks to be estimated

int<lower=1> Ncol_complete; //332 completely observed 3-hour blocks

int<lower=1> index_length; //1328

int<lower=1, upper=352> kk[N];//row index for fully observed observations

real<lower=0> y[N];

int<lower=0>  index1[index_length];

int<lower=0>  index2[index_length];

int<lower=0>  index_11[1];

int<lower=0>  index_12[1];

int<lower=0>  index_18[143];

int<lower=0>  index_25[56];

int<lower=0>  index_26[131];

int<lower=0>  hour_index[Ncol_complete];
int<lower=1> D;// indicating number of days fitted

int<lower=1> K;// indicating hour of weeks 56

matrix[K, K] K_TD; //penalty prior precision matrix for hour of the week coefficients

int num_basis;

matrix[D, num_basis] B_annual;

matrix[K, (num_basis-1)] eta_bspline;

}

parameters {

vector[num_basis] beta_annual_raw;//bspline coefficients for annual effects

real<lower=0> sigma;

vector[(num_basis-1)] beta_eta_raw;

real mu_beta_eta;

real<lower=0> sigma_beta_eta;
//real<lower=0> tau;//RW penalty scale for beta_eta

ordered[2] beta[K];// ordered hour of week effects: a matrix of 56 * 2 where 2nd col
//is greater than 1st col for all rows.

}

transformed parameters {
vector[(num_basis-1)] beta_eta;

vector[2] alpha;//intercept of hour of week

matrix[K,2] beta_adj;//sum to zero hour of week component mean

vector[num_basis] beta_annual;//coefficients for annual effects

beta_annual = beta_annual_raw*sigma;

beta_eta = mu_beta_eta + sigma_beta_eta * beta_eta_raw;

//beta_eta[1] =beta_eta_raw[1];

//for(i in 2:(num_basis-1)){
//beta_eta[i] = beta_eta[i-1] + beta_eta_raw[i] * tau;
//}

alpha[1] = mean(beta[,1]);
alpha[2] = mean(beta[,2]);
for(t in 1:K){

}

beta_annual = beta_annual_raw* sigma;

}

model{
vector[Ncol] annual_effects_gen;

matrix[Ncol,2] ordered_effects_gen;

vector[K] lambda;

vector[Ncol] lambda_gen;

beta_annual_raw~ normal(0, 1);//prior for the slopes following Gelman 2008

sigma ~cauchy(0,2.5);

beta_eta_raw~std_normal();

sigma_beta_eta~std_normal();

mu_beta_eta~std_normal();

lambda = inv_logit(eta_bspline* beta_eta);

for( t in 1:6){
lambda_gen[(1+(t-1)*56): (56*t) ] = lambda;
}

for(t in 337: Ncol){
lambda_gen[t] = lambda[t-(K*6)];

}

for(t in 1:D){
annual_effects_gen[(1+(t-1)*8) : (8*t)] = rep_vector( (B_annual * beta_annual)[t], 8);
}

//44-day there are whole 6 weeks plus 2 days (16 3-hour blocks)
//week 1-6
for (t in 1:Ncol){
if(t <=K){
ordered_effects_gen[t,1] = alpha[1] + beta_adj[t,1] + annual_effects_gen[t];
ordered_effects_gen[t,2] = alpha[2] + beta_adj[t,2] + annual_effects_gen[t];
}
if(t >=(K+1) && t<= (K*2))
{
ordered_effects_gen[t,1] = alpha[1] + beta_adj[t-K,1] + annual_effects_gen[t];
ordered_effects_gen[t,2] = alpha[2] + beta_adj[t-K,2] + annual_effects_gen[t];
}

if(t >=(2*K+1) && t<= (K*3))
{
ordered_effects_gen[t,1] = alpha[1] + beta_adj[t-(2*K),1] + annual_effects_gen[t];
ordered_effects_gen[t,2] = alpha[2] + beta_adj[t-(2*K),2] + annual_effects_gen[t];
}

if(t >=(3*K+1) && t<= (K*4))
{
ordered_effects_gen[t,1] = alpha[1] + beta_adj[t-(3*K),1] + annual_effects_gen[t];
ordered_effects_gen[t,2] = alpha[2] + beta_adj[t-(3*K),2] + annual_effects_gen[t];
}

if(t >=(4*K+1) && t<= (K*5))
{
ordered_effects_gen[t,1] = alpha[1] + beta_adj[t-(4*K),1] + annual_effects_gen[t];
ordered_effects_gen[t,2] = alpha[2] + beta_adj[t-(4*K),2] + annual_effects_gen[t];
}

if(t >=(5*K+1) && t<= (K*6))
{
ordered_effects_gen[t,1] = alpha[1] + beta_adj[t-(5*K),1] + annual_effects_gen[t];
ordered_effects_gen[t,2] = alpha[2] + beta_adj[t-(5*K),2] + annual_effects_gen[t];
}

}

//remaining 16 of 3-hour blocks
for(t in 337:Ncol){
ordered_effects_gen[t,1] = alpha[1] + beta_adj[t-(K*6),1] + annual_effects_gen[t];
ordered_effects_gen[t,2] = alpha[2] + beta_adj[t-(K*6),2] + annual_effects_gen[t];
}

target += multi_normal_prec_lpdf (beta_adj[,1]| rep_vector(0, K),( 1/square(sigma) * K_TD));

target += multi_normal_prec_lpdf (beta_adj[,2]| rep_vector(0, K),( 1/square(sigma) * K_TD));

//likelihood

target += varying_lpdf( y| N,  kk,  index1,  index2, index_11,  index_12,  index_18,  index_25, index_26, hour_index,  Ncol, Ncol_complete,  index_length, ordered_effects_gen, sigma, lambda_gen);

}

generated quantities{

vector[Ncol] annual_effects_gen;

matrix[Ncol,2] ordered_effects_gen;

vector[K] lambda;

vector[Ncol] lambda_gen;

vector[Ncol] log_lik_vec;//loglikelihood per time period

vector[N] cdf1;

vector[N] sum_pdf_gen;

vector[index_length] diff_gen;

lambda = inv_logit(eta_bspline* beta_eta);

for( t in 1:6){
lambda_gen[(1+(t-1)*56): (56*t) ] = lambda;
}

for(t in 337: Ncol){
lambda_gen[t] = lambda[t-(K*6)];

}

for(t in 1:D){
annual_effects_gen[(1+(t-1)*8) : (8*t)] = rep_vector( (B_annual * beta_annual)[t], 8);
}

//44-day there are whole 6 weeks plus 2 days (16 3-hour blocks)
//week 1-6
for (t in 1:Ncol){
if(t <=K){
ordered_effects_gen[t,1] = alpha[1] + beta_adj[t,1] + annual_effects_gen[t];
ordered_effects_gen[t,2] = alpha[2] + beta_adj[t,2] + annual_effects_gen[t];
}
if(t >=(K+1) && t<= (K*2))
{
ordered_effects_gen[t,1] = alpha[1] + beta_adj[t-K,1] + annual_effects_gen[t];
ordered_effects_gen[t,2] = alpha[2] + beta_adj[t-K,2] + annual_effects_gen[t];
}

if(t >=(2*K+1) && t<= (K*3))
{
ordered_effects_gen[t,1] = alpha[1] + beta_adj[t-(2*K),1] + annual_effects_gen[t];
ordered_effects_gen[t,2] = alpha[2] + beta_adj[t-(2*K),2] + annual_effects_gen[t];
}

if(t >=(3*K+1) && t<= (K*4))
{
ordered_effects_gen[t,1] = alpha[1] + beta_adj[t-(3*K),1] + annual_effects_gen[t];
ordered_effects_gen[t,2] = alpha[2] + beta_adj[t-(3*K),2] + annual_effects_gen[t];
}

if(t >=(4*K+1) && t<= (K*5))
{
ordered_effects_gen[t,1] = alpha[1] + beta_adj[t-(4*K),1] + annual_effects_gen[t];
ordered_effects_gen[t,2] = alpha[2] + beta_adj[t-(4*K),2] + annual_effects_gen[t];
}

if(t >=(5*K+1) && t<= (K*6))
{
ordered_effects_gen[t,1] = alpha[1] + beta_adj[t-(5*K),1] + annual_effects_gen[t];
ordered_effects_gen[t,2] = alpha[2] + beta_adj[t-(5*K),2] + annual_effects_gen[t];
}

}

//remaining 16 of 3-hour blocks
for(t in 337:Ncol){
ordered_effects_gen[t,1] = alpha[1] + beta_adj[t-(K*6),1] + annual_effects_gen[t];
ordered_effects_gen[t,2] = alpha[2] + beta_adj[t-(K*6),2] + annual_effects_gen[t];
}

for(n in 1:N){
cdf1[n] =  (1-lambda_gen[kk[n]]) *normal_cdf(y[n], ordered_effects_gen[kk[n],1], sigma) + lambda_gen[kk[n]]*normal_cdf(y[n], ordered_effects_gen[kk[n],2], sigma);
sum_pdf_gen[n] =  log_mix((1-lambda_gen[kk[n]]),normal_lpdf(y[n] | ordered_effects_gen[kk[n],1], sigma),normal_lpdf(y[n] | ordered_effects_gen[kk[n],2], sigma));
}

for(n in 1:index_length){

diff_gen[n] = (cdf1[index2[n]] - cdf1[index1[n]]);

}

log_lik_vec = log_lik_vec_gen( Ncol, Ncol_complete, index_11, index_12, index_18, index_25,  index_26, hour_index,  diff_gen, sum_pdf_gen);

}

Here is the 2-component model for using all the of underlying classical data points
```stan
//For QUT data with 3-hour block interacting with day of the week and day of the year effects
//Fitting with the whole 20+24 days of data
data {
int<lower=1> N; // Total number of observations = 7373

int<lower=1> K; // Index for 3-hour (8*7) day interaction ==56 per week

int<lower=1> D;// indicating number of days fitted D=20+24 days for overall dataset

int<lower=1> Ncol;//==352, 160 for first 20-day while 192 for 24-day

int<lower=1,upper=Ncol> kk[N];//Hour of the day indicator for individual obs

vector[N] dat_complete;//defined LOG-TRANSFORMED observations

int num_basis;

matrix[D, num_basis] B_annual; //sum to zero cyclic bspline

matrix[K, K] K_TD; //penalty prior precision matrix for hour of the week coefficients

matrix[K, (num_basis-1)] eta_bspline;

}

parameters {

vector[(num_basis-1)] beta_eta_raw;

real<lower=0> tau;//RW penalty scale for beta_eta

ordered[2] beta[K];// ordered hour of week effects: a matrix of 56 * 2 where 2nd col
//is greater than 1st col for all rows.

vector[num_basis] beta_annual_raw;//bspline coefficients for annual effects

real<lower=0> sigma;

}

transformed parameters {
vector[(num_basis-1)] beta_eta;

vector[K] eta;

vector[2] alpha;//intercept of hour of week

matrix[K,2] beta_adj;//sum to zero hour of week component mean

vector[num_basis] beta_annual;//coefficients for annual effects

beta_annual = beta_annual_raw*sigma;

beta_eta[1] =beta_eta_raw[1];

for(i in 2:(num_basis-1)){
beta_eta[i] = beta_eta[i-1] + beta_eta_raw[i] * tau;
}

eta = eta_bspline* beta_eta;

alpha[1] = mean(beta[,1]);
alpha[2] = mean(beta[,2]);

for(t in 1:K){

}

}

model{

vector[Ncol] annual_effects_gen;

matrix[Ncol,2] ordered_overall_mean;

vector[K] lambda;

vector[Ncol] lambda_gen;

beta_annual_raw~ normal(0, 1);//prior for the slopes following Gelman 2008

sigma ~cauchy(0,2.5);

beta_eta_raw~normal(0,1);

tau~normal(0,1);

target += multi_normal_prec_lpdf (beta_adj[,1]| rep_vector(0, K),( 1/square(sigma) * K_TD));

target += multi_normal_prec_lpdf (beta_adj[,2]| rep_vector(0, K),( 1/square(sigma) * K_TD));

lambda = inv_logit(eta);

for( t in 1:6){
lambda_gen[(1+(t-1)*56): (56*t) ] = lambda;
}

for(t in 337: Ncol){
lambda_gen[t] = lambda[t-(K*6)];

}

for(t in 1:D){
annual_effects_gen[(1+(t-1)*8) : (8*t)] = rep_vector( (B_annual * beta_annual)[t], 8);
}

//44-day there are whole 6 weeks plus 2 days (16 3-hour blocks)
//week 1-6
for (t in 1:Ncol){
if(t <=K){
ordered_overall_mean[t,1] = alpha[1] + beta_adj[t,1] + annual_effects_gen[t];
ordered_overall_mean[t,2] = alpha[2] + beta_adj[t,2] + annual_effects_gen[t];
}
if(t >=(K+1) && t<= (K*2))
{
ordered_overall_mean[t,1] = alpha[1] + beta_adj[t-K,1] + annual_effects_gen[t];
ordered_overall_mean[t,2] = alpha[2] + beta_adj[t-K,2] + annual_effects_gen[t];
}

if(t >=(2*K+1) && t<= (K*3))
{
ordered_overall_mean[t,1] = alpha[1] + beta_adj[t-(2*K),1] + annual_effects_gen[t];
ordered_overall_mean[t,2] = alpha[2] + beta_adj[t-(2*K),2] + annual_effects_gen[t];
}

if(t >=(3*K+1) && t<= (K*4))
{
ordered_overall_mean[t,1] = alpha[1] + beta_adj[t-(3*K),1] + annual_effects_gen[t];
ordered_overall_mean[t,2] = alpha[2] + beta_adj[t-(3*K),2] + annual_effects_gen[t];
}

if(t >=(4*K+1) && t<= (K*5))
{
ordered_overall_mean[t,1] = alpha[1] + beta_adj[t-(4*K),1] + annual_effects_gen[t];
ordered_overall_mean[t,2] = alpha[2] + beta_adj[t-(4*K),2] + annual_effects_gen[t];
}

if(t >=(5*K+1) && t<= (K*6))
{
ordered_overall_mean[t,1] = alpha[1] + beta_adj[t-(5*K),1] + annual_effects_gen[t];
ordered_overall_mean[t,2] = alpha[2] + beta_adj[t-(5*K),2] + annual_effects_gen[t];
}

}

//remaining 16 of 3-hour blocks
for(t in 337:Ncol){
ordered_overall_mean[t,1] = alpha[1] + beta_adj[t-(K*6),1] + annual_effects_gen[t];
ordered_overall_mean[t,2] = alpha[2] + beta_adj[t-(K*6),2] + annual_effects_gen[t];
}

//likelihood

for(n in 1:N){

target +=log_mix((1-lambda_gen[kk[n]]), normal_lpdf(dat_complete[n]| ordered_overall_mean[kk[n],1], sigma),
normal_lpdf(dat_complete[n]| ordered_overall_mean[kk[n],2], sigma));
}

}

generated quantities{
vector[N] log_lik_vec;//loglikelihood per time period

vector[Ncol] annual_effects_gen;

matrix[Ncol,2] ordered_overall_mean;

vector[K] lambda;

vector[Ncol] lambda_gen;

lambda = inv_logit(eta);

for( t in 1:6){
lambda_gen[(1+(t-1)*56): (56*t) ] = lambda;
}

for(t in 337: Ncol){
lambda_gen[t] = lambda[t-(K*6)];

}

for(t in 1:D){
annual_effects_gen[(1+(t-1)*8) : (8*t)] = rep_vector( (B_annual * beta_annual)[t], 8);
}

//44-day there are whole 6 weeks plus 2 days (16 3-hour blocks)
//week 1-6
for (t in 1:Ncol){
if(t <=K){
ordered_overall_mean[t,1] = alpha[1] + beta_adj[t,1] + annual_effects_gen[t];
ordered_overall_mean[t,2] = alpha[2] + beta_adj[t,2] + annual_effects_gen[t];
}
if(t >=(K+1) && t<= (K*2))
{
ordered_overall_mean[t,1] = alpha[1] + beta_adj[t-K,1] + annual_effects_gen[t];
ordered_overall_mean[t,2] = alpha[2] + beta_adj[t-K,2] + annual_effects_gen[t];
}

if(t >=(2*K+1) && t<= (K*3))
{
ordered_overall_mean[t,1] = alpha[1] + beta_adj[t-(2*K),1] + annual_effects_gen[t];
ordered_overall_mean[t,2] = alpha[2] + beta_adj[t-(2*K),2] + annual_effects_gen[t];
}

if(t >=(3*K+1) && t<= (K*4))
{
ordered_overall_mean[t,1] = alpha[1] + beta_adj[t-(3*K),1] + annual_effects_gen[t];
ordered_overall_mean[t,2] = alpha[2] + beta_adj[t-(3*K),2] + annual_effects_gen[t];
}

if(t >=(4*K+1) && t<= (K*5))
{
ordered_overall_mean[t,1] = alpha[1] + beta_adj[t-(4*K),1] + annual_effects_gen[t];
ordered_overall_mean[t,2] = alpha[2] + beta_adj[t-(4*K),2] + annual_effects_gen[t];
}

if(t >=(5*K+1) && t<= (K*6))
{
ordered_overall_mean[t,1] = alpha[1] + beta_adj[t-(5*K),1] + annual_effects_gen[t];
ordered_overall_mean[t,2] = alpha[2] + beta_adj[t-(5*K),2] + annual_effects_gen[t];
}

}

//remaining 16 of 3-hour blocks
for(t in 337:Ncol){
ordered_overall_mean[t,1] = alpha[1] + beta_adj[t-(K*6),1] + annual_effects_gen[t];
ordered_overall_mean[t,2] = alpha[2] + beta_adj[t-(K*6),2] + annual_effects_gen[t];
}

for(n in 1:N){

log_lik_vec[n] =log_mix((1-lambda_gen[kk[n]]), normal_lpdf(dat_complete[n]|ordered_overall_mean[kk[n],1], sigma),
normal_lpdf(dat_complete[n]|ordered_overall_mean[kk[n],2], sigma));

}
}
``````

Rplot.pdf (1.0 MB)

The first graph is the traceplot for mixture location 1, on the left is fitted with the customised-constructed likelihood function for real data, while the right is fitted with all classical observations for the real data.

Rplot01.pdf (1.1 MB)
The second plot is the same but fitted with simulated 2-component data. It can be seen that the customised-likelihood function model mimcs the classical likelihood function model results very well and it also converged.

Any insights will be greatly appreciated.

My best guess is that the model actually is a bad fit to the real data, i.e. either you have a bug in the model or the data are actually different than what you assume (or both).

I don’t understand why you would want to fit to summary stats only, but I think this strengthens the evidence that the way you coded your model has a bug/math mistake. Unfortunately the code you posted is quite long and I don’t have capacity to try to understand it and figure out what is wrong.

I would first focus on double checking the code does what you think it does. If it converges with simulated data, does it converge on correct values? If it converges on simulated data but not on the real dataset, the real dataset must be somewhat different from the simulated data. Try finding ways to simulate data that are closer to “real” and see if your model stops converging.

After that, my best advice would be to do posterior predictive checks (see manual and this forum if this is not familiar) on the non-converged model to see where its implications differ from the actual data.

Hope that helps.