Conditional logit function creation

I am in a stage of learning to use Stan and Rstan for modeling hierarchical problems. I am trying to create my own in stan function to estimate conditional logit models, but in my function definition I have some problem. I am writing to see if it is possible for you to help me. I share my code:

functions { 
  real my_logit(int choice, matrix Xm, row_vector[K] eta) {
    vector[rows(Xm)] probs;
    probs = softmax(Xm * eta);
    return categorical_lpmf(choice| probs);
  int<lower=2> J; // of alternatives/outcomes
  int<lower=1> N;  // of observations
  int<lower=1> K; // of covariates
  int<lower=0,upper=J> Y[N];
  matrix[J,K] X[N];

parameters {
  vector[K] beta;  // attribute effects 

  for (i in 1:N)

[edit: escape formatting]

The problem here is that you’re not incrementing the target log density. Your model loop should be this:

for (n in 1:N) {
  target += my_logit(Y[n], X[n, n], beta);

But you’ll find this won’t compile because your matrix operations are not going to conform dimensionally. Multi-logit has a vector of covariates per item and a matrix of parameters. But you have to be careful about identifiability and even if you get all this right, your model won’t work without an identification strategy (such as a prior on beta).

For the idiomatic way to define multi-logit and a discussion of identifiability in Stan, see:

We also have a more stable categorical_logit built-in which does the softmax internally.

Hello, maybe there is a confusion. The function is rather a nice conditional logit where each alternative or possible value of y has its predictor variables. I’ll give you a data generating function in R. for more detail.

generate_mnl_data <- function(N=1000, K=2, J=3, beta=c(1, -2)){
  if(length(beta) != K) stop ("incorrect number of parameters")
  Y <- rep(NA, N)
  X <- list(NULL) 
  for (i in 1:N) {
    X[[i]] <- matrix(rnorm(J*K), ncol=K)
    Y[i] <- sample(x=J, size=1, prob=exp(X[[i]]%*%beta))
  list(N=N, J=J, K=K, Y=Y, X=X, beta=beta)
d0 <- generate_mnl_data()

where K is the number of attributes of the alternatives, J are the alternatives and N the individuals

I don’t know R that well. Is that prob statement just going to normalize, so the real probabilities would be softmax(X[[I]] %*% beta)?

I had to look up R’s sample function—that takes a single draw with probabilities proportional to the prob argument. So this really just is a categorical softmax thing. So the code is very simple and matches yours, but I didn’t use a function and I used categorical_logit (which applies softmax to its argument).

data {
  int<lower=2> J;
  int<lower=1> N;
  int<lower=1> K;
  array[N] int<lower=1, upper=J> y;
  array[N] matrix[J, K] x;
parameters {
  vector[K] beta;
model {
  for (n in 1:N) {
    y[n] ~ categorical_logit(x[n] * beta);

This model is overparameterized as described in the user’s guide section I linked in the last reply. Thus it’s not identified and won’t work as is. The section discusses strategies for identification:

  1. Pin one of the value sin beta to 0.
  2. Enforce a sum-to-zero constraint on beta
  3. Put a proper prior on beta

The code for these strategies are discussed in our user’s guide. For pinning you declare vector[K - 1] beta_prefix and then append a 0 to the end (or beginning) to define beta. For sum-to-zero, you declare also with K - 1 and then set the last value to the negative sum of the first elements. For prior, you can leave it as K.

Thank you.