I have the following Stan model:
data {
int<lower=1> N; //number of samples = 281
int<lower=1> numSources; // number of sources = 3 (grocery store, restaurant, sushi bar)
int<lower=1> numStates; // number of states = 2 (cooked, raw)
int<lower=1> numAppearances; // number of appearances = 2 (moified, plain)
int<lower=1> numForms; // number of forms = 4 (chunk, chopped, fillet, whole)
int<lower=1> numColours; // number of colours = 2 (light, red)
int<lower=0, upper=1> Mislabelled[N]; // vector of mislabelled samples
int<lower=1, upper=numSources> Source[N]; // vector of sources
int<lower=1, upper=numStates> State[N]; // vector of states
int<lower=1, upper=numAppearances> Appearance[N]; // vector of appearances
int<lower=1, upper=numForms> Form[N]; // vector of forms
int<lower=1, upper=numColours> Colour[N]; // vector of colours
}
parameters {
real beta_0;
vector[numSources] beta_source;
vector[numStates] beta_state;
vector[numAppearances] beta_appearance;
vector[numForms] beta_form;
vector[numColours] beta_colour;
}
model {
vector[N] eta;
//priors
// need to fill in with means and standard deviations
beta_0 ~ normal();
beta_source ~ normal();
beta_state ~ normal();
beta_appearance ~ normal();
beta_form ~ normal();
beta_colour ~ normal();
for (i in 1:N) {
eta[i] = beta_0 + beta_source[Source[i]] * Source[i] +
beta_state[State[i]] * State[i] +
beta_appearance[Appearance[i]] * Appearance[i] +
beta_form[Form[i]] * Form[i] +
beta_colour[Colour[i]] * Colour[i];
}
// likelihood
Mislabelled ~ bernoulli_logit(eta);
}
generated quantities {
real odds_intercept;
vector[numSources] odds_source;
vector[numStates] odds_state;
vector[numAppearances] odds_appearance;
vector[numForms] odds_form;
vector[numColours] odds_colour;
real prob_intercept;
vector[numSources] prob_source;
vector[numStates] prob_state;
vector[numAppearances] prob_appearance;
vector[numForms] prob_form;
vector[numColours] prob_colour;
odds_intercept = exp(beta_0);
odds_source = exp(beta_source);
odds_state = exp(beta_state);
odds_appearance = exp(beta_appearance);
odds_form = exp(beta_form);
odds_colour = exp(beta_colour);
prob_intercept = odds_intercept / (1 + odds_intercept) * 100;
prob_source = odds_source ./ (1 + odds_source) * 100;
prob_state = odds_state ./ (1 + odds_state) * 100;
prob_appearance = odds_appearance ./ (1 + odds_appearance) * 100;
prob_form = odds_form ./ (1 + odds_form) * 100;
prob_colour = odds_colour ./ (1 + odds_colour) * 100;
}
I want to be able to centre my normal priors (normal()') on their MLEs inferred from
R::glm()`. However, I’m unsure how to do this with the current model encoding.
Also, since each of my predictors is categorical, I should choose a reference level. Currently the model outputs all L levels, rather than L-1 of them. R::glm()
selects the first level of the reference automatically (if levels are already alphabetized as they are here). Probably the easiest way to do this would be as follows:
if (i > 1) {
beta[i] - beta[1]
...
}
Any thoughts on how to proceed with these?