Centering Prior on MLEs in categorical logistic regression

I have the following Stan model:

data {
  int<lower=1> N; //number of samples = 281
  int<lower=1> numSources; // number of sources = 3 (grocery store, restaurant, sushi bar)
  int<lower=1> numStates; // number of states = 2 (cooked, raw)
  int<lower=1> numAppearances; // number of appearances = 2 (moified, plain)
  int<lower=1> numForms; // number of forms = 4 (chunk, chopped, fillet, whole)
  int<lower=1> numColours; // number of colours = 2 (light, red)

  int<lower=0, upper=1> Mislabelled[N]; // vector of mislabelled samples
  int<lower=1, upper=numSources> Source[N]; // vector of sources
  int<lower=1, upper=numStates> State[N]; // vector of states
  int<lower=1, upper=numAppearances> Appearance[N]; // vector of appearances
  int<lower=1, upper=numForms> Form[N]; // vector of forms
  int<lower=1, upper=numColours> Colour[N]; // vector of colours
}

parameters {
  real beta_0;

  vector[numSources] beta_source;
  vector[numStates] beta_state;
  vector[numAppearances] beta_appearance;
  vector[numForms] beta_form;
  vector[numColours] beta_colour;
}

model {
  vector[N] eta;

  //priors

 // need to fill in with means and standard deviations

  beta_0 ~ normal();

  beta_source ~ normal();
  beta_state ~ normal();
  beta_appearance ~ normal();
  beta_form ~ normal();
  beta_colour ~ normal();

  for (i in 1:N) {
    eta[i] = beta_0 + beta_source[Source[i]] * Source[i] + 
                 beta_state[State[i]] * State[i] +
                 beta_appearance[Appearance[i]] * Appearance[i] +
                 beta_form[Form[i]] * Form[i] +
                 beta_colour[Colour[i]] * Colour[i];
  }

  // likelihood
  Mislabelled ~ bernoulli_logit(eta);
}

generated quantities {
  real odds_intercept;
  vector[numSources] odds_source;
  vector[numStates] odds_state;
  vector[numAppearances] odds_appearance;
  vector[numForms] odds_form;
  vector[numColours] odds_colour;

  real prob_intercept;
  vector[numSources] prob_source;
  vector[numStates] prob_state;
  vector[numAppearances] prob_appearance;
  vector[numForms] prob_form;
  vector[numColours] prob_colour;

  odds_intercept = exp(beta_0);
  odds_source = exp(beta_source);
  odds_state = exp(beta_state);
  odds_appearance = exp(beta_appearance);
  odds_form = exp(beta_form);
  odds_colour = exp(beta_colour);

  prob_intercept = odds_intercept / (1 + odds_intercept) * 100;
  prob_source = odds_source ./ (1 + odds_source) * 100;
  prob_state = odds_state ./ (1 + odds_state) * 100;
  prob_appearance = odds_appearance ./ (1 + odds_appearance) * 100;
  prob_form = odds_form ./ (1 + odds_form) * 100;
  prob_colour = odds_colour ./ (1 + odds_colour) * 100;

}

I want to be able to centre my normal priors (normal()') on their MLEs inferred from R::glm()`. However, I’m unsure how to do this with the current model encoding.

Also, since each of my predictors is categorical, I should choose a reference level. Currently the model outputs all L levels, rather than L-1 of them. R::glm() selects the first level of the reference automatically (if levels are already alphabetized as they are here). Probably the easiest way to do this would be as follows:

if (i > 1) {
     beta[i] - beta[1]
     ...
}

Any thoughts on how to proceed with these?

That seems to be double dipping. Your usage of data (MLE) information in the prior will make the effective sample size of the posterior larger than the actual sample size. Don’t you want to use MLEs as initial values in sampling instead?

@harrelfe I’ve seen several papers by colleagues centre their priors on MLEs… They are probably not Bayesians. Regardless, the prior should be selected independently of seeing any of the data (or a statistical summary thereof). Given this, I’m curious about your suggestion to use the MLEs as starting conditions instead. Wouldn’t this have the same consequence that you bring up in your response? It would speed up convergence to the posterior distribution though.

First of all I have seldom needed to override the default random starting values in Stan. But to your question, no it’s not he same, unless the sampling goes haywire so that the results after thousands of posterior draws was indeed dependent on the choice of initial values.

I also run the Stan optimizer then feed this MLE into the sampler as starting values.