Multinomial Model - How to set Prior for categorical variables?

I am beginner using Stan for modelling structural causal models. I would be extremely grateful if someone could please help with the following query. I have looked in the Stan docs as well as the forums, but I wasn’t able to find the answer.

My data contains only categorical variables given below.

location     age       sex       topic               rating
denmark    under 35     M        Computers             1      
germany    over 45      F        Cars                  3      
denmark    over 45      M        Electronics           4      
france     under 35     F        Plants                5     

R code as follows:

raw_data <- read.csv("tp_data.csv")
str(raw_data)
tp <- dplyr::select(raw_data,age,location,sex,rating,topic) 

#Converting Categorical variables to numeric 

tp$location <- as.numeric(as.factor(tp$location))
tp$rating <- as.numeric(as.factor(tp$rating))
tp$topic<- as.numeric(as.factor(tp$topic))
tp$age <- as.numeric(as.factor(tp$age))
tp$sex <- as.numeric(as.factor(tp$sex))

The Stan code as follows :

data{
  int<lower = 0> N; // number of instances in the data
  vector[N] age; // age
  vector[N] sex; //sex
  int location[N]; // location
  int topic[N]; // topic
  int rating[N]; //  rating 
  
  
}
parameters{
  real attitude;
  simplex[N] alpha;
  simplex [N] beta;
  simplex [N] gamma;
  alpha = softmax (age+sex);
  beta = softmax (age+sex) ;
  gamma = softmax (age+sex);
}

model{
 location ~ multinomial( alpha)
  topic ~multinomial(beta)
rating ~multinomial (gamma)
}

Currently this code throws an error as I have detailed in Multinomial distribution for modelling Causal Models with all categorical variables

However, here I am seeking help on how to define the priors.

I am not sure how to set the prior for categorical variables using Stan after encoding the categorical variables in R. I referred the following link but wasn’t able to translate to my data Multilevel, Categorical/Multinomial Model- Terms and priors.

Hey Sam, it’ll help if you write down the model in the thread so people can quickly engage.

I’d suggest having a look at BRMS . I don’t use it personally but I suspect It’ll be much easier to get started by specifying your model as a formula and priors for it as parameters.

If you want to use Stan directly I’d suggest starting out by writing as much of the Stan model yourself as you can and then others can help you fill it out.

Also isn’t rating an ordinal variable?

Hey @emiruz , Thank you so much for the response. I really appreciate the suggestions.

Sure I will write down the model in the thread. I posted my code in another thread where I am seeking solution to another error (Multinomial distribution for modelling Causal Models with all categorical variables).

I will look into BRMS as well and see how things work out.

Yes, rating is an ordinal variable. Does it suggest that I shouldn’t be using multinomial distribution ?

No problem, RE rating I think that depends on your use case and what you’re trying to infer ; it may be ok if it’s not naive. Otherwise, the values of an ordinal variable are clearly related so to treat them as independently occurring could bias
your analysis .