Advice on the Efficiency of Stan Specification

sshcecilia · June 27, 2018, 5:50am

Hi,

I am interested to model the probability that a consumer will buy each of the choices available given the price of these choices. To model the probability, I represented the data in a new format. I would like to seek advice on whether this gives the fastest runtime possible. If no, may I ask what should I do instead to improve the efficiency? Thank you in advance for the help!

My data takes on the following format:

Table 1

Mix represents the proportion of consumers buying each item for that period. For row 1 of Table 1, a mix of 0.4 suggests that 40% of the consumers bought apple in period 1 at the price of $2. Within each period, the mix should sum to 1 across all the items. Mix is the variable I am interested to model.

To model Mix, I first transform the data into the following format:

Table 2

The ‘apple’, ‘banana’ and ‘carrot’ column in Table 2 are obtained from the ‘item’ column in Table 1. If the ‘item’ column takes on apple, the ‘apple’ column will be allocated the value of 1 while the ‘banana’ and ‘carrot’ columns are allocated the value 0. The last 3 columns of Table 2 are obtained from the ‘price’ column in Table 1 by inserting the price of each item for that period.

Data input into stan:

Mix is represented by y in the code. N represents the number of data and in this case 9 while K represents the number of items/choices available and in this case 3 (apple, banana and carrot). Choice is a matrix of dimension N by K formed by taking the ‘apple’, ‘banana’ and ‘carrot’ column of Table 2.

Price is a matrix of dimension N by K formed by taking the last 3 columns of Table 2.

The ‘logit-like’ model is fitted to the data and it takes on the following structure:

Probability of choosing option i =

The code is attached below:

stan.code = "data {
      int<lower=0> N;           //no. of obs
      int<lower=0> K;           // No. of Choice
      vector[N] y;              //outcome
      matrix[N,K] Choice;
      matrix[N,K] Price;
}

parameters {
      vector[K-1] alpha_raw;
      vector<lower=0>[K] beta;
      real<lower=0> sigma;
}

transformed parameters {
      vector[K] alpha;
      alpha = append_row(0, alpha_raw);
}

model {
      alpha_raw ~ normal(0,1);
      beta ~ lognormal(0,1);
      sigma ~ normal(0,0.5);
      y ~ normal( (exp(Choice*alpha - (Choice .* Price)*beta))./
                ((exp(rep_matrix((alpha)',N) - Price .* rep_matrix((beta)',N)))*rep_vector(1,K)), sigma);   
}
"

bgoodri · June 27, 2018, 6:52am

If y is a simplex for each period, then you should not be using a normal likelihood. @James_Savage has Stan code somewhere for a model like this.

sshcecilia · June 27, 2018, 9:24am

Hi @bgoodri, thanks so much for the comment! May I clarify for the likelihood of y, should I be using log(softmax(utilities)) like the one in line 37 of the following link by @James_Savage?

https://gist.github.com/khakieconomics/0333a54dff4fabf6204080ca5bf92cb6

bgoodri · June 27, 2018, 1:46pm

I think I was thinking of

sshcecilia · June 29, 2018, 8:41am

I have tried the following 2 sets of codes.

Code 1

This code models the probability that consumers will buy each of the product. This utilizes normal distribution.

stan.code = "data {
  int<lower=0> N;              
  int<lower=0> K;               // No. of Choice
  matrix[N,K] Mix;                //outcome
  matrix[N,K] Price;
}

transformed data {
  vector[N*K] y;
  y = to_vector(Mix);
}

parameters {
  vector[K-1] alpha_raw;
  vector<lower=0>[K] beta;
  real<lower=0> sigma;
}

transformed parameters {
  vector[K] alpha;
  matrix[N,K] gamma;
  vector[N*K] mu;
  alpha = append_row(0, alpha_raw);
  for (i in 1:N) {
    gamma[i] = (softmax( alpha - beta .* Price[i]' ))';
  }
  mu = to_vector(gamma);
}

model {
  alpha_raw ~ normal(0,1);
  beta ~ lognormal(0,1);
  sigma ~ normal(0,0.5);
  y ~ normal( mu, sigma);   
}
"

Code 2

This code models sales of each product instead of probability. Hence, it utilizes multinomial distribution.

"data {
  int<lower=0> N;             
  int<lower=0> K;          // No. of Choice
  matrix[N,K] Price;
  int Mix[N,K];              //outcome
}

parameters {
  vector[K-1] alpha_raw;
  vector<lower=0>[K] beta;
}

transformed parameters {
  vector[K] alpha;
  matrix[N, K] pred_shares;
  alpha = append_row(0, alpha_raw);
  for (t in 1:N) {
    pred_shares[t,1:K] = (softmax((alpha - beta .* (Price[t,1:K])')))'; 
  }
}

model {
  alpha_raw ~ normal(0,1);
  beta ~ lognormal(0,1);
  
  for (t in 1:N) {
    Mix[t] ~ multinomial(pred_shares[t]');
  }
}
  "

May I ask which method would be more appropriate? Both methods produce different estimates of the coefficients. In addition, code 1 takes a shorter time to run compared to code 2. Any other advice on how I can improve this further? Thank you!

James_Savage · June 29, 2018, 5:55pm

That’s the one! Here’s a demonstration including the data format. http://rpubs.com/jimsavage/using_decisionmaker_variables_in_mixed_logit

Note you should probably either have an outside option (no choice) or drop one of the choices (and express product attributes as differences from the attributes of the dropped product) so that implied probabilities for the observed choices sums to something less than 0, and the nonrandom utility for the outside/dropped good is 0. Else you can add some number to all utilities and come up with the same choices–the model will be unidentified.

bgoodri · June 29, 2018, 6:31pm

Jim’s/

sshcecilia · July 2, 2018, 7:10am

Thanks @James_Savage for the advice! I will make ammendments to include an outside option. However, I only have data at the aggregate level, for instance the total sales for each item, rather than the decision made by the individuals. Hence, I referred to the example from http://modernstatisticalworkflow.blogspot.com/2017/03/aggregate-random-coefficients-logita.html
and came up with code 2 as shown in my previous comment (http://discourse.mc-stan.org/t/advice-on-the-efficiency-of-stan-specification/4648/5?u=sshcecilia). Would it be possible for you to tell me what is wrong with this code besides adding an outside option?

James_Savage · July 2, 2018, 10:23am

Sure –

So if I’m understanding correctly, each row for you is the number of sales of each product within some market? If so, the second proposed model seems right. I’m also assuming here that you don’t want a random coefficient model, so your estimates will suffer from IIA issues.

If you want a specification like the first, which will be faster, then the correct way of doing it is to convert each row into sales shares, rather than raw sales. Then fit

(log(sales_shares[t]) - log(outside_option_share[t]))' ~ normal(alpha - beta * Price[t]', sigma)

If you fit this, you should get the same estimates with the two specifications. Let me know if you don’t!

Of course your prices will typically have endogeneity issues, so once you’ve found some cost-shifting instruments or Hausman instruments (price of same good in other cities—I personally think these are bad instruments but they seem to be commonly used accepted)—you can use an IV specification in place of the model above. See here http://modernstatisticalworkflow.blogspot.com/2017/11/bayesian-instrumental-variables-with.html

Hope this helps!

Jim

James_Savage · July 2, 2018, 4:31pm

Here’s a little demo in R that simulates both sets of data from the aggregate logit model and fits them using both the multinomial specification of sales (which allows for small market measurement error) and the OLS estimate. As you can see, they both recover the generative parameters in the simulation.


softmax <- function(x) exp(x)/sum(exp(x))

simulate_data <- function(T, P, alpha, beta, market_size) {
  prices <- matrix(NA, T, P+1)
  utilities <-matrix(NA, T, P+1)
  shares <- matrix(NA,  T, P+1)
  sales <- matrix(NA,  T, P+1)
  prices[,P+1] <- 0
  prices[,-(P+1)] <- matrix(exp(rnorm(T*P)), T, P)
  for(t in 1:T) {
    utilities[t,] <- t(alpha + prices[t,] * beta)
    utilities[t,P+1] <- 0
    shares[t,] <- t(softmax(utilities[t,]))
    sales[t,] <- rmultinom(1, market_size, prob = shares[t,])
  }
  
  list(sales = sales, shares = shares, prices = prices, alpha, beta)
  
}

some_data <- simulate_data(20, 10, 2, -1, 500)

outside_share <- rep(log(some_data$shares[,11]), each = 10)
shares <- as.vector(log(some_data$shares[,-11]))

Y <- shares - outside_share

p <- as.vector(some_data$prices[,-11])
lm(Y ~ p)


model_cd <- "
data {
  int T;
  int P;
  int sales[T, P+1];
  matrix[T, P+1] prices;
}
parameters {
  real alpha;
  real beta;
}
model {
  alpha ~ normal(0, 1);
  beta ~ normal(0, 1);

  for(t in 1:T) {
    vector[P+1] predicted_shares = softmax(append_row(alpha + prices[t, 1:P]' * beta, rep_vector(0.0, 1)));
    
    sales[t] ~ multinomial(predicted_shares);
  }
}
"


library(rstan)
options(mc.cores = parallel::detectCores())

compiled_model <- stan_model(model_code = model_cd)

fit <- sampling(compiled_model, data = list(T = 20, P = 10, sales = some_data$sales, prices = some_data$prices))

fit

Topic		Replies	Views
Model Advice Modeling	5	412	November 15, 2019
Logit Choice Set Up and Comparison Modeling loo , economics	2	723	January 11, 2021
Stan how to optimize the estimation process? Algorithms	6	1862	August 30, 2017
Simple Multinomial Logistic Regression Performance Modeling performance	5	1010	September 30, 2020
Vectorizing Multinomial Logit Choice Model for GPU Developers	5	570	April 18, 2023

Advice on the Efficiency of Stan Specification

Related topics