Mixture model for unknown fraction of outliers

Hi all,
I am new to Stan and am trying to use it to fit a linear model where the fraction of outliers are unknown. I have been able to write out a model in python to use with Emcee but I want to use it in Stan.
So far I have gotten my basic linear model (with no outliers) to work

line_model = """
data {{
    int<lower=0> N;
    real<lower=0> sigma;
    real y[N];
    real x[N];
    
}}

parameters {{
    real m;
    real c;
}}

transformed parameters {{
    real theta[N];
    
    for (j in 1:N)
        theta[j] = m*x[j] + c;
        
}}

model {{
    m ~ normal({mlower}, {mupper});
    c ~ normal({clower}, {cupper});
    y ~ normal(theta, sigma);
     
}}
"""

What I am stuck on is adding in some sort of parameter that represents the fraction of outliers in the data. I am trying to follow what has been done in this blog but adapting it for Stan.
If this is possible and anyone knows how to do it I would greatly appreciate some help, I have been stuck on this for a few days,

https://dfm.io/posts/mixture-models/

In my case I only care about parameters m, b and Q

Thanks,

1 Like

Have a look at https://mc-stan.org/docs/2_19/stan-users-guide/mixture-modeling-chapter.html for mixture models in Stan

However, a sparse regression is what you might need https://betanalpha.github.io/assets/case_studies/bayes_sparse_regression.html and User-friendly implementation of regularised horseshoe

1 Like

Hi, I just happened to find this and it’s very intriguing to me. How would you use a sparse regression to identify outliers in a mixture? Any reference?

Hello have a look if this can be helpful

It’s compositional data not mixtures though

1 Like