I am beginner using Stan for modelling structural causal models. I would be extremely grateful if someone could please help with the following query. I have looked in the Stan docs as well as the forums, but I wasn’t able to find the answer.
My data contains only categorical variables given below.
location age sex topic rating
denmark under 35 M Computers 1
germany over 45 F Cars 3
denmark over 45 M Electronics 4
france under 35 F Plants 5
R code as follows:
raw_data <- read.csv("tp_data.csv")
str(raw_data)
tp <- dplyr::select(raw_data,age,location,sex,rating,topic)
#Converting Categorical variables to numeric
tp$location <- as.numeric(as.factor(tp$location))
tp$rating <- as.numeric(as.factor(tp$rating))
tp$topic<- as.numeric(as.factor(tp$topic))
tp$age <- as.numeric(as.factor(tp$age))
tp$sex <- as.numeric(as.factor(tp$sex))
The Stan code as follows :
data{
int<lower = 0> N; // number of instances in the data
vector[N] age; // age
vector[N] sex; //sex
int location[N]; // location
int topic[N]; // topic
int rating[N]; // rating
}
parameters{
real attitude;
simplex[N] alpha;
simplex [N] beta;
simplex [N] gamma;
alpha = softmax (age+sex);
beta = softmax (age+sex) ;
gamma = softmax (age+sex);
}
model{
location ~ multinomial( alpha)
topic ~multinomial(beta)
rating ~multinomial (gamma)
}
Currently this code throws an error as I have detailed in Multinomial distribution for modelling Causal Models with all categorical variables
However, here I am seeking help on how to define the priors.
I am not sure how to set the prior for categorical variables using Stan after encoding the categorical variables in R. I referred the following link but wasn’t able to translate to my data Multilevel, Categorical/Multinomial Model- Terms and priors.