Hi,
I am interested to model the probability that a consumer will buy each of the choices available given the price of these choices. To model the probability, I represented the data in a new format. I would like to seek advice on whether this gives the fastest runtime possible. If no, may I ask what should I do instead to improve the efficiency? Thank you in advance for the help!
My data takes on the following format:
Table 1
Mix represents the proportion of consumers buying each item for that period. For row 1 of Table 1, a mix of 0.4 suggests that 40% of the consumers bought apple in period 1 at the price of $2. Within each period, the mix should sum to 1 across all the items. Mix is the variable I am interested to model.
To model Mix, I first transform the data into the following format:
Table 2
The ‘apple’, ‘banana’ and ‘carrot’ column in Table 2 are obtained from the ‘item’ column in Table 1. If the ‘item’ column takes on apple, the ‘apple’ column will be allocated the value of 1 while the ‘banana’ and ‘carrot’ columns are allocated the value 0. The last 3 columns of Table 2 are obtained from the ‘price’ column in Table 1 by inserting the price of each item for that period.
Data input into stan:
Mix is represented by y in the code. N represents the number of data and in this case 9 while K represents the number of items/choices available and in this case 3 (apple, banana and carrot). Choice is a matrix of dimension N by K formed by taking the ‘apple’, ‘banana’ and ‘carrot’ column of Table 2.
Price is a matrix of dimension N by K formed by taking the last 3 columns of Table 2.
The ‘logit-like’ model is fitted to the data and it takes on the following structure:
Probability of choosing option i =
The code is attached below:
stan.code = "data {
int<lower=0> N; //no. of obs
int<lower=0> K; // No. of Choice
vector[N] y; //outcome
matrix[N,K] Choice;
matrix[N,K] Price;
}
parameters {
vector[K-1] alpha_raw;
vector<lower=0>[K] beta;
real<lower=0> sigma;
}
transformed parameters {
vector[K] alpha;
alpha = append_row(0, alpha_raw);
}
model {
alpha_raw ~ normal(0,1);
beta ~ lognormal(0,1);
sigma ~ normal(0,0.5);
y ~ normal( (exp(Choice*alpha - (Choice .* Price)*beta))./
((exp(rep_matrix((alpha)',N) - Price .* rep_matrix((beta)',N)))*rep_vector(1,K)), sigma);
}
"