Brief description of the problem
I need to speed up the estimation of a hierarchical Poisson demand model with individual effects for a large dataset.
I wonder if there are any obvious performance improvements I could do, like vectorization, reparametrizations, etc. Also I would like some opinions on whether this problem is at all feasible in Stan.
Also, when I try to estimate it with a ~1.5MM observations sample (about 8000 products and 16 groups) it gets stuck. I think it might be a memory issue.
Details
I want to implement a demand model for electronic products. I’m using a hierarchical Poisson model where the coefficients of interest are the price responses and where I need to include product specific and week effects.
The levels of the hierarchy are determined by the different groups products belong to (each product belongs to only one group). The motivation for a hierarchical model is that some products have limited data or minimal price variation during the sampled period and it is desirable that their estimates be pulled towards their group means.
I observe daily sales, prices and other product covariates.
An important assumption I’m making is that I need within week variation to identify the “true” price response (let’s take this as given for the purpose of the question). I could do this by including product-week dummy variables but doing this is problematic. To avoid the “incidental parameters problem” associated with these individual effects, and also to reduce the number of parameters to be estimated I use a transformation of the Poisson that leads to an equivalent likelihood function that looks like the product of multinomials (*). This leads to a model in which there are no dummies but the other parameters of the Poisson are identified through within week variation for each product.
Stan Code
This is the Stan model in which the likelihood is given by the product of within product-week observations.
// Poisson model using product-week FE and multinomial representation so that
// the FE are concentrated out and not estimated.
data {
int<lower=1> N; // nb of obs
int<lower=1> J; // nb of products
int<lower=1> G; // nb of groups
int<lower=1> W; // nb product-week
int<lower=1> counts[W]; // W counts, i.e. nb of obs in each product-week (normally 7)
int<lower=1> product[N]; // index of product for row n; map obs to product
int<lower=1> group[J]; // group of product j
int<lower=0> y[N]; // number of sales
real p[N]; // prices
}
parameters {
real mu_beta[G];
real<lower=0> sigma_beta[G];
real beta_raw[J];
}
transformed parameters {
vector[N] lambda;
real beta[J];
// center and scale parameters
for (j in 1:J)
beta[j] = mu_beta[group[j]] + sigma_beta[group[j]] * beta_raw[j]; // price coefficient, different for each product
// poisson parameter
for (n in 1:N)
lambda[n] = p[n] * beta[product[n]];
}
model {
// priors
beta_raw ~ normal(0, 1);
mu_beta ~ normal(0, 2.5);
sigma_beta ~ normal(0, 2.5);
// likelihood of the data
{
int pos = 1;
for (aw in 1:W) {
segment(y, pos, counts[aw]) ~ multinomial( softmax(segment(lambda, pos, counts[aw])) );
pos = pos + counts[aw];
}
}
}
R Code
This code generates a synthetic data set and estimates the model. It assumes the Stan file is called discourse_poisson_FE.stan
.
library(data.table)
library(truncdist)
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
# In my application G = 16, and Jg > 500 which results
# in a dataset > 1,500,000.
G = 4 # number of groups
Jg = 50 # number of products per group
J = Jg * G # total number of products
W = 27 # number of weeks
week_days = 7
D = W * week_days # numbe of days
mu_price = 50 + (rbeta(G, 3, 3) - 0.5) * 50
sigma_price = 100 + (rbeta(G, 5, 5) - 0.5) * 50
mu_beta = rbeta(n = G, 4, 3) * 2
sigma_beta = rbeta(1000, 2, 5) * 6 + 0.1
mu_alpha = 0.5
sigma_alpha = 1.267
mu_theta = rnorm(0, 2.5)
# Generate dataset. I draw from a Zero inflated Poisson as this is closest
# to my application.
simdata = CJ(ASIN = 1:J, dd = 1:D)
simdata[, ww := rep(1:(max(dd)/week_days), each=week_days), by=ASIN]
simdata[, AW_ix := .GRP, by=.(ASIN, ww)]
simdata[, group := cut(ASIN, G, labels = F)]
simdata[, price := 1/100 * rtrunc(.N, a = 0, b = 10000, spec = 'norm', mean = mu_price[group], sd = sigma_price[group]), by=group]
suppressWarnings(
simdata[, beta := rtrunc(n = 1, a = 0, b = 1.5,
spec = 'norm', mean = mu_beta[group],
sd = sigma_beta[group]), by=ASIN]
)
simdata[, alpha := rnorm(1, beta, sigma_alpha), by=AW_ix]
simdata[, lambda := exp(0 * alpha - beta * price)]
simdata[, theta := rnorm(1, -1, 2.5), by=ASIN]
simdata[, nonzero := as.integer(runif(.N) < plogis(theta))]
simdata[, totalquantity := nonzero * rpois(.N, lambda)]
simdata[, elast := (1/lambda - 1/(exp(-lambda) + 1)) * beta * mean(price), by=ASIN]
stopifnot( 0 == nrow(simdata[!complete.cases(simdata)]) )
print(sum(is.na(simdata$totalquantity)))
# create list ro pass to Stan
setkey(simdata, AW_ix)
data_stan = list(
N = simdata[, .N],
J = simdata[, uniqueN(ASIN)],
G = simdata[, uniqueN(group)],
W = simdata[, uniqueN(AW_ix)],
product = simdata[, ASIN],
group = simdata[, first(group), by=ASIN][, V1],
y = simdata[, totalquantity],
p = simdata[, price],
counts = simdata[, .N, by=AW_ix][, N]
)
#**************************************************************
# ESTIMATE MODEL WITH STAN
#**************************************************************
model_file = 'discourse_poisson_FE.stan'
compiled_model = stan_model(model_file)
bfit = sampling(compiled_model,
data=data_stan,
chains=getOption("mc.cores"),
iter=1000,
warmup=500,
cores=getOption("mc.cores")
)
EDIT:
(*) See https://en.wikipedia.org/wiki/Fixed-effect_Poisson_model, and Lancaster 2004.