I am in a stage of learning to use Stan and Rstan for modeling hierarchical problems. I am trying to create my own in stan function to estimate conditional logit models, but in my function definition I have some problem. I am writing to see if it is possible for you to help me. I share my code:
functions {
real my_logit(int choice, matrix Xm, row_vector[K] eta) {
vector[rows(Xm)] probs;
probs = softmax(Xm * eta);
return categorical_lpmf(choice| probs);
}
}
data{
int<lower=2> J; // of alternatives/outcomes
int<lower=1> N; // of observations
int<lower=1> K; // of covariates
int<lower=0,upper=J> Y[N];
matrix[J,K] X[N];
}
parameters {
vector[K] beta; // attribute effects
}
model{
for (i in 1:N)
my_logit(Y[i],X[i,i],beta[1:K])
}
[edit: escape formatting]
The problem here is that you’re not incrementing the target log density. Your model loop should be this:
for (n in 1:N) {
target += my_logit(Y[n], X[n, n], beta);
}
But you’ll find this won’t compile because your matrix operations are not going to conform dimensionally. Multi-logit has a vector of covariates per item and a matrix of parameters. But you have to be careful about identifiability and even if you get all this right, your model won’t work without an identification strategy (such as a prior on beta
).
For the idiomatic way to define multi-logit and a discussion of identifiability in Stan, see:
We also have a more stable categorical_logit
built-in which does the softmax internally.
Hello, maybe there is a confusion. The function is rather a nice conditional logit where each alternative or possible value of y has its predictor variables. I’ll give you a data generating function in R. for more detail.
generate_mnl_data <- function(N=1000, K=2, J=3, beta=c(1, -2)){
if(length(beta) != K) stop ("incorrect number of parameters")
Y <- rep(NA, N)
X <- list(NULL)
for (i in 1:N) {
X[[i]] <- matrix(rnorm(J*K), ncol=K)
Y[i] <- sample(x=J, size=1, prob=exp(X[[i]]%*%beta))
}
list(N=N, J=J, K=K, Y=Y, X=X, beta=beta)
}
d0 <- generate_mnl_data()
where K is the number of attributes of the alternatives, J are the alternatives and N the individuals
I don’t know R that well. Is that prob statement just going to normalize, so the real probabilities would be softmax(X[[I]] %*% beta)
?
I had to look up R’s sample function—that takes a single draw with probabilities proportional to the prob
argument. So this really just is a categorical softmax thing. So the code is very simple and matches yours, but I didn’t use a function and I used categorical_logit
(which applies softmax to its argument).
data {
int<lower=2> J;
int<lower=1> N;
int<lower=1> K;
array[N] int<lower=1, upper=J> y;
array[N] matrix[J, K] x;
}
parameters {
vector[K] beta;
}
model {
for (n in 1:N) {
y[n] ~ categorical_logit(x[n] * beta);
}
}
This model is overparameterized as described in the user’s guide section I linked in the last reply. Thus it’s not identified and won’t work as is. The section discusses strategies for identification:
- Pin one of the value sin
beta
to 0.
- Enforce a sum-to-zero constraint on
beta
- Put a proper prior on
beta
The code for these strategies are discussed in our user’s guide. For pinning you declare vector[K - 1] beta_prefix
and then append a 0 to the end (or beginning) to define beta
. For sum-to-zero, you declare also with K - 1
and then set the last value to the negative sum of the first elements. For prior, you can leave it as K.