How to scale hierarchical models when sample count is in tens or hundreds of thousands

ruojol · May 7, 2019, 11:49am

Are there any good tricks to try for scaling hierarchical models besides using a CPU cluster in pystan?
GPU support is not available yet so google ML engine is not an option?
Sample sizes are over 50K users ranging to millions


data {
 int<lower=0> N; // number of cases
 int<lower=0> J; // number of groups eg.users
 int<lower=0, upper=1> iszero[N]; // indicates negative outcomes
 int<lower=1> id[N]; // group number for each case
}
parameters {
 vector<lower=0, upper=1>[J] theta; // chance of success per test group
 real<lower=0, upper=1> phi; // population chance of success
 real<lower=1> kappa; // population concentration
}

// Stan model

model {
 // Priors for Bernoulli:
 phi ~ beta(2, 2);
 kappa ~ pareto(1, 1.5); // hyperprior (requires that kappa > 1st Pareto parameter)
 theta ~ beta(kappa * phi, kappa * (1 - phi)); // prior
 // Likelihood sampling statements:
 for (n in 1:N) {
   iszero[n] ~ bernoulli(theta[id[n]]);
   }
 }

// Output data

generated quantities {
real post_phi_global;
real post_kappa_global;

post_phi_global = phi;
post_kappa_global = kappa;
}

bbbales2 · May 7, 2019, 6:28pm

Might not help, but for this model you can make your likelihood easier to evaluate with sufficient statistics.

For each group j, you have:

\prod_i p(y_i | g_j)

Assuming y_i \in [0, 1] and p_j is the probability of 1 in each group, expand the above to:

\prod_i p_j^{y_i} (1 - p_j)^{1 - y_i}

And that’s the same as:

p_j^{\sum_i y_i} (1 - p_j)^{\sum_i (1 - y_i)}

And you can compute those sums of y for each group on the outside and pass them in as data.

ruojol · May 8, 2019, 9:09am

Thanks for the tip, I used this as a basis:

github.com

mdekstrand/rat-tumors/blob/master/ratxmodel.stan

data {
    int<lower=0> J;
    int<lower=0> n[J];
    int<lower=0> y[J];
}
parameters {
    real<lower=0,upper=1> phi;
    real<lower=0.1> lambda;
    real<lower=0,upper=1> theta[J];
}
transformed parameters {
    real<lower=0> alpha;
    real<lower=0> beta;
    alpha = lambda * phi;
    beta = lambda * (1 - phi);
}
model {
    phi ~ beta(1,1);
    lambda ~ pareto(0.1, 1.5);
    theta ~ beta(alpha, beta);

This file has been truncated. show original

martinmodrak · May 10, 2019, 6:53am

One more thing to consider: with such a large sample size for a relatively simple model, it is well possible that you are deep in the asymptopia where the posterior is close to normal centered on the MAP estimate. (this will likely depend strongly on how big J is compared to N) If - and that’s a big IF - you are there, using the optimizing method in Stan could give you very good results. Maybe try running the model with say 20K rows of the dataset with both sampling and optimizing and if the results match, you can probably use optimizing safely for your large datasets.

Topic		Replies	Views
Scaling up a hierarchical model Modeling bioinformatics	28	3205	June 6, 2019
Help with hierarchical model Modeling	2	601	February 11, 2018
Fitdistr and resample as a strategy for overlarge data sets Modeling fitting-issues , performance	6	919	July 5, 2017
Request for help: Looking for large models to test with Stan GPU support General	9	747	January 6, 2021
Stan on GPU: looking for model+dataset examples for empirical evaluation of speedups General	36	3563	March 5, 2018

How to scale hierarchical models when sample count is in tens or hundreds of thousands

Related topics