# Mixture model for unknown fraction of outliers

Hi all,
I am new to Stan and am trying to use it to fit a linear model where the fraction of outliers are unknown. I have been able to write out a model in python to use with Emcee but I want to use it in Stan.
So far I have gotten my basic linear model (with no outliers) to work

``````line_model = """
data {{
int<lower=0> N;
real<lower=0> sigma;
real y[N];
real x[N];

}}

parameters {{
real m;
real c;
}}

transformed parameters {{
real theta[N];

for (j in 1:N)
theta[j] = m*x[j] + c;

}}

model {{
m ~ normal({mlower}, {mupper});
c ~ normal({clower}, {cupper});
y ~ normal(theta, sigma);

}}
"""
``````

What I am stuck on is adding in some sort of parameter that represents the fraction of outliers in the data. I am trying to follow what has been done in this blog but adapting it for Stan.
If this is possible and anyone knows how to do it I would greatly appreciate some help, I have been stuck on this for a few days,

https://dfm.io/posts/mixture-models/

In my case I only care about parameters m, b and Q

Thanks,

1 Like

Have a look at https://mc-stan.org/docs/2_19/stan-users-guide/mixture-modeling-chapter.html for mixture models in Stan

However, a sparse regression is what you might need https://betanalpha.github.io/assets/case_studies/bayes_sparse_regression.html and User-friendly implementation of regularised horseshoe

1 Like

Hi, I just happened to find this and it’s very intriguing to me. How would you use a sparse regression to identify outliers in a mixture? Any reference?

Hello have a look if this can be helpful

It’s compositional data not mixtures though

1 Like