# Multinomial distribution for modelling Causal Models with all categorical variables

I am beginner using Stan for modelling structural causal models. I would be extremely grateful if someone could please help with the following query. I have looked in the Stan docs as well as the forums, but I wasn’t able to find the answer.

My data contains only categorical variables and I am trying to model the causality between different variables using multinomial distribution. Then send the data from Stan to R to predict rating using all the remaining attributes (topic, location,age,sex) using logistic regression.

The data is as follows :

``````
location     age       sex       topic               rating
denmark    under 35     M        Computers             1
germany    over 45      F        Cars                  3
denmark    over 45      M        Electronics           4
france     under 35     F        Plants                5

``````

I am trying to model the following equations :
Location \sim multinomial(age+sex)
topic \sim multinomial(age+sex)
rating \sim multinomial(age+sex)

The Stan code is as follows :

``````
data{
int<lower = 0> N; // number of instances in the data
vector[N] age; // age
vector[N] sex; //sex
int location[N]; // location
int topic[N]; // topic
int rating[N]; //  rating

}
parameters{
real attitude;
simplex[N] alpha;
simplex [N] beta;
simplex [N] gamma;
alpha = softmax (age+sex);
beta = softmax (age+sex) ;
gamma = softmax (age+sex);
}

model{
location ~ multinomial( alpha)
topic ~multinomial(beta)
rating ~multinomial (gamma)
}
``````

The R code is as follows:

``````raw_data <- read.csv("tp_data.csv")
str(raw_data)
tp <- dplyr::select(raw_data,age,location,sex,rating,topic)

#Converting Categorical variables to numeric

tp\$location <- as.numeric(as.factor(tp\$location))
tp\$rating <- as.numeric(as.factor(tp\$rating))
tp\$topic<- as.numeric(as.factor(tp\$topic))
tp\$age <- as.numeric(as.factor(tp\$age))
tp\$sex <- as.numeric(as.factor(tp\$sex))

#A series of test/training partition creation and p is percentage of data into training
trainIndex <- createDataPartition(tp\$location, p = .8,
list = FALSE,
times = 1)
tpTrain <- tp[trainIndex,]
tpTest  <- tp[-trainIndex,]

n <- nrow(tpTrain)
ne <- nrow(tpTest)

# ------------------------------
tp_stan_train <- list(N = n, age = tpTrain[,c("age")], sex = tpTrain[,c("sex")],
location = tpTrain[,c("location")], topic = tpTrain[,c("topic")], rating = tpTrain[,c("rating")])

fit_tp_train <- stan(file = 'tp_train.stan', data = tp_stan_train, iter = 200, chains = 1, verbose = TRUE)

la_tp_train <- extract(fit_tp_train, permuted = TRUE)

## Apply Logistic Regression using model <- glm( rating ~ location+topic+age+sex)
``````

## Error I get is the following SYNTAX ERROR, MESSAGE(S) FROM PARSER: error in ‘model328474b5861_tp_train3’ at line 18, column 2

``````16:   simplex [N] beta;
17:   simplex [N] gamma;
18:   alpha = softmax (age+sex);
^
19:   beta = softmax (age+sex) ;
``````

``````PARSER EXPECTED: <one of the following:
a variable declaration, beginning with type,
(int, real, vector, row_vector, matrix, unit_vector,
simplex, ordered, positive_ordered,
corr_matrix, cov_matrix,
cholesky_corr, cholesky_cov
or '}' to close variable declarations>
Error in stanc(filename, allow_undefined = TRUE) :
failed to parse Stan model 'tp_train3' due to the above error.
``````

If I assign values to alpha in transformed parameters block I get the error that I can’t assign a value to variable declared in another block. Please help.

I am not sure how to set the value of alpha which should be a simplex as the softmax(age + sex) in the Stan code ?

1 Like

Sorry for taking long to respond. You need to move both the declaration of `alpha` and the assignment into the `transformed parameters` block. Also note that the `brms` package supports multinomial models via formula syntax using Stan as the backend. It makes many common tasks easier so unless your aim is to learn Stan in order to be able to write more complex models yourself, it might be a better choice.

Best of luck with your model!

1 Like

@martinmodrak Thank you so much. The code works now. I started looking into `brms` since I was facing errors in Stan and `brms` definitely makes the things a lot easier