Advice on the Efficiency of Stan Specification

specification

#1

Hi,

I am interested to model the probability that a consumer will buy each of the choices available given the price of these choices. To model the probability, I represented the data in a new format. I would like to seek advice on whether this gives the fastest runtime possible. If no, may I ask what should I do instead to improve the efficiency? Thank you in advance for the help!

My data takes on the following format:

Table 1

Mix represents the proportion of consumers buying each item for that period. For row 1 of Table 1, a mix of 0.4 suggests that 40% of the consumers bought apple in period 1 at the price of $2. Within each period, the mix should sum to 1 across all the items. Mix is the variable I am interested to model.

To model Mix, I first transform the data into the following format:

Table 2

The ‘apple’, ‘banana’ and ‘carrot’ column in Table 2 are obtained from the ‘item’ column in Table 1. If the ‘item’ column takes on apple, the ‘apple’ column will be allocated the value of 1 while the ‘banana’ and ‘carrot’ columns are allocated the value 0. The last 3 columns of Table 2 are obtained from the ‘price’ column in Table 1 by inserting the price of each item for that period.

Data input into stan:

Mix is represented by y in the code. N represents the number of data and in this case 9 while K represents the number of items/choices available and in this case 3 (apple, banana and carrot). Choice is a matrix of dimension N by K formed by taking the ‘apple’, ‘banana’ and ‘carrot’ column of Table 2.

Price is a matrix of dimension N by K formed by taking the last 3 columns of Table 2.

The ‘logit-like’ model is fitted to the data and it takes on the following structure:

Probability of choosing option i =

The code is attached below:

stan.code = "data {
      int<lower=0> N;           //no. of obs
      int<lower=0> K;           // No. of Choice
      vector[N] y;              //outcome
      matrix[N,K] Choice;
      matrix[N,K] Price;
}

parameters {
      vector[K-1] alpha_raw;
      vector<lower=0>[K] beta;
      real<lower=0> sigma;
}

transformed parameters {
      vector[K] alpha;
      alpha = append_row(0, alpha_raw);
}

model {
      alpha_raw ~ normal(0,1);
      beta ~ lognormal(0,1);
      sigma ~ normal(0,0.5);
      y ~ normal( (exp(Choice*alpha - (Choice .* Price)*beta))./
                ((exp(rep_matrix((alpha)',N) - Price .* rep_matrix((beta)',N)))*rep_vector(1,K)), sigma);   
}
"

#2

If y is a simplex for each period, then you should not be using a normal likelihood. @James_Savage has Stan code somewhere for a model like this.


#3

Hi @bgoodri, thanks so much for the comment! May I clarify for the likelihood of y, should I be using log(softmax(utilities)) like the one in line 37 of the following link by @James_Savage?

https://gist.github.com/khakieconomics/0333a54dff4fabf6204080ca5bf92cb6


#4

I think I was thinking of


#5

I have tried the following 2 sets of codes.

Code 1

This code models the probability that consumers will buy each of the product. This utilizes normal distribution.

stan.code = "data {
  int<lower=0> N;              
  int<lower=0> K;               // No. of Choice
  matrix[N,K] Mix;                //outcome
  matrix[N,K] Price;
}

transformed data {
  vector[N*K] y;
  y = to_vector(Mix);
}

parameters {
  vector[K-1] alpha_raw;
  vector<lower=0>[K] beta;
  real<lower=0> sigma;
}

transformed parameters {
  vector[K] alpha;
  matrix[N,K] gamma;
  vector[N*K] mu;
  alpha = append_row(0, alpha_raw);
  for (i in 1:N) {
    gamma[i] = (softmax( alpha - beta .* Price[i]' ))';
  }
  mu = to_vector(gamma);
}

model {
  alpha_raw ~ normal(0,1);
  beta ~ lognormal(0,1);
  sigma ~ normal(0,0.5);
  y ~ normal( mu, sigma);   
}
"

Code 2

This code models sales of each product instead of probability. Hence, it utilizes multinomial distribution.

"data {
  int<lower=0> N;             
  int<lower=0> K;          // No. of Choice
  matrix[N,K] Price;
  int Mix[N,K];              //outcome
}

parameters {
  vector[K-1] alpha_raw;
  vector<lower=0>[K] beta;
}

transformed parameters {
  vector[K] alpha;
  matrix[N, K] pred_shares;
  alpha = append_row(0, alpha_raw);
  for (t in 1:N) {
    pred_shares[t,1:K] = (softmax((alpha - beta .* (Price[t,1:K])')))'; 
  }
}

model {
  alpha_raw ~ normal(0,1);
  beta ~ lognormal(0,1);
  
  for (t in 1:N) {
    Mix[t] ~ multinomial(pred_shares[t]');
  }
}
  "

May I ask which method would be more appropriate? Both methods produce different estimates of the coefficients. In addition, code 1 takes a shorter time to run compared to code 2. Any other advice on how I can improve this further? Thank you!


#6

That’s the one! Here’s a demonstration including the data format. http://rpubs.com/jimsavage/using_decisionmaker_variables_in_mixed_logit

Note you should probably either have an outside option (no choice) or drop one of the choices (and express product attributes as differences from the attributes of the dropped product) so that implied probabilities for the observed choices sums to something less than 0, and the nonrandom utility for the outside/dropped good is 0. Else you can add some number to all utilities and come up with the same choices–the model will be unidentified.


#7

Jim’s/


#8

Thanks @James_Savage for the advice! I will make ammendments to include an outside option. However, I only have data at the aggregate level, for instance the total sales for each item, rather than the decision made by the individuals. Hence, I referred to the example from http://modernstatisticalworkflow.blogspot.com/2017/03/aggregate-random-coefficients-logita.html
and came up with code 2 as shown in my previous comment (http://discourse.mc-stan.org/t/advice-on-the-efficiency-of-stan-specification/4648/5?u=sshcecilia). Would it be possible for you to tell me what is wrong with this code besides adding an outside option?


#9

Sure –

So if I’m understanding correctly, each row for you is the number of sales of each product within some market? If so, the second proposed model seems right. I’m also assuming here that you don’t want a random coefficient model, so your estimates will suffer from IIA issues.

If you want a specification like the first, which will be faster, then the correct way of doing it is to convert each row into sales shares, rather than raw sales. Then fit

(log(sales_shares[t]) - log(outside_option_share[t]))' ~ normal(alpha - beta * Price[t]', sigma)

If you fit this, you should get the same estimates with the two specifications. Let me know if you don’t!

Of course your prices will typically have endogeneity issues, so once you’ve found some cost-shifting instruments or Hausman instruments (price of same good in other cities—I personally think these are bad instruments but they seem to be commonly used accepted)—you can use an IV specification in place of the model above. See here http://modernstatisticalworkflow.blogspot.com/2017/11/bayesian-instrumental-variables-with.html

Hope this helps!

Jim


#10

Here’s a little demo in R that simulates both sets of data from the aggregate logit model and fits them using both the multinomial specification of sales (which allows for small market measurement error) and the OLS estimate. As you can see, they both recover the generative parameters in the simulation.


softmax <- function(x) exp(x)/sum(exp(x))

simulate_data <- function(T, P, alpha, beta, market_size) {
  prices <- matrix(NA, T, P+1)
  utilities <-matrix(NA, T, P+1)
  shares <- matrix(NA,  T, P+1)
  sales <- matrix(NA,  T, P+1)
  prices[,P+1] <- 0
  prices[,-(P+1)] <- matrix(exp(rnorm(T*P)), T, P)
  for(t in 1:T) {
    utilities[t,] <- t(alpha + prices[t,] * beta)
    utilities[t,P+1] <- 0
    shares[t,] <- t(softmax(utilities[t,]))
    sales[t,] <- rmultinom(1, market_size, prob = shares[t,])
  }
  
  list(sales = sales, shares = shares, prices = prices, alpha, beta)
  
}

some_data <- simulate_data(20, 10, 2, -1, 500)

outside_share <- rep(log(some_data$shares[,11]), each = 10)
shares <- as.vector(log(some_data$shares[,-11]))

Y <- shares - outside_share

p <- as.vector(some_data$prices[,-11])
lm(Y ~ p)


model_cd <- "
data {
  int T;
  int P;
  int sales[T, P+1];
  matrix[T, P+1] prices;
}
parameters {
  real alpha;
  real beta;
}
model {
  alpha ~ normal(0, 1);
  beta ~ normal(0, 1);

  for(t in 1:T) {
    vector[P+1] predicted_shares = softmax(append_row(alpha + prices[t, 1:P]' * beta, rep_vector(0.0, 1)));
    
    sales[t] ~ multinomial(predicted_shares);
  }
}
"


library(rstan)
options(mc.cores = parallel::detectCores())

compiled_model <- stan_model(model_code = model_cd)

fit <- sampling(compiled_model, data = list(T = 20, P = 10, sales = some_data$sales, prices = some_data$prices))

fit