Hi all,

I am testing the performance of horseshoe prior in differents datasets of sklearn, one of the dataset is Iris dataset (https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html), I need change the actual model that I have, because Iris dataset is a dataset of 3 classes.

So I need help to addapt this model to use with Iris dataset :( I don’t know so much about Modeling.

Any suggestion is welcome :)

I attached the model that I currently use.

Thanks and regards.

```
data {
int<lower=0> n;
int<lower=0> p;
matrix[n,p] X;
vector[n] y;
}
parameters {
vector[p] beta;
vector<lower=0>[p] lambda;
real<lower=0> tau;
real<lower=0> sigma;
}
model {
lambda ~ cauchy(0, 1);
tau ~ cauchy(0, 1);
for (i in 1:p)
beta[i] ~ normal(0, lambda[i] * tau);
y ~ normal(X * beta, sigma);
}
```

Iris dataset has only 4 predictors so horseshoe prior is not very useful as it presents prior information that only small portion of the coefficients are big. Also there are 150 observations, so that you have to make the prior very strong in order to have any effect for the coefficients. Do you other datasets in your mind?

See Stan Manual Section on Multi-Logit Regression Stan User’s Guide

I’m getting only datasets from sklearn… diabetes, makeblobs… what other dataset from sklearn would you consider to test?

I’m not familiar what datasets sklean has, but there is not much difference between using horseshoe or Gaussian or some other common prior in regression if n>10p, where n is the number of observations and p is the number of predictors. So you if you want to test performance of horseshoe or even better regularized horseshoe, then I recommend finding datasets with n<5p and preferably n<p.

Are you know some dataset with that characteristics? it does not need to be from sklearn.

Thanks for your help :)