Scaling up a hierarchical model

Thanks @stijn! I will try.

You could use lambda[i] ~ gamma(mu * inv_sigma, inv_sigma),
with inv_sigma = 1 / sigma.

mean(rgamma(10000,1 * 5, 5))
[1] 0.9961446
var(rgamma(10000,1 * 5, 5))
[1] 0.2008218

First, you need to use a non-centered parameterization of scale, which won’t speed up iterations but will greatly improve effective sample size.

parameters {
  vector[M] scale_std:

transformed parameters {
  vector<lower = 0>[M] scale = exp(mu + scale_std * sigma);
 
model {
  scale_std ~ normal(0, 1);

You can compound declare define, so that would be

vector<lower = 0>[M] rate = inv(scale);

We dont’ recommend diffuse priors like these:

shape ~ gamma(0.01, 0.01);

Everything should be much better behaved with something with thinner tails and more mass concentrated around where you think the values should be.

Also, the cauchy(0, 5) on sigma is very wide compared to what sigma probably is.

Many thanks @Bob_Carpenter! I have a few quick questions.

Q1. Is the following not good?

parameters {
  vector<lower=0>[M] scale_std:
  ...
transformed parameters {
  vector<lower=0>[M] scale = mu + scale_std * sigma;
  ...

Q2. When we do non-centered param, what hyperprior distribution should I use for mu and sigma? It seems that I may leave them undefined?

Q3. I cannot use matrix for discrete distributions, correct?

A1: that’s fine.

A2: if you don’t specify a prior for a variable, it’s implicitly uniform on the values that satisfy its constraints. We recommend at least weakly informative priors everywhere.

A3: Not sure what you mean by matrix for discrete distributions. It’s not fully vectorized, if that’s what you mean. The index of the manual has the legal instantiations including vectorizations (the ones that end in s).

Thank you for sharing your model @kbchoi, and for all that are trying to help.

Indeed, being able to run these kind of hierarchical models “a la limma” or DeSEQ, in STAN, would allow to customize them to experimental designs and would be quite useful.

Did you ever succeed to run your model with a few 1000s genes ? Did you try to use narrower priors ?

Best,
Arnaud

No luck yet. I am not sure if using narrower priors is a right thing to do. (Correct me if I am wrong). When a parallel Stan comes (as Aki mentioned above), I will try it again.

What is the current model you are using now?
Bob gave you lots of useful hints.
Your hyper-priors have some room of improvements.

Thanks @kbchoi -

I am not sure if using narrower priors is a right thing to do. (Correct me if I am wrong).

I think that’s what was suggested by @Bob_Carpenter:

We dont’ recommend diffuse priors like these: shape ~ gamma(0.01, 0.01)
Everything should be much better behaved with something with thinner tails and more mass concentrated around where you think the values should be.
Also, the cauchy(0, 5) on sigma is very wide compared to what sigma probably is

@stijn also had suggestion for the priors.

I think having narrower prior will definitely help the computation- I would even try normal priors to start with. About scientific validity, may-be there is an argument for working your way from narrow priors to wider ones, and observe the effect on the results? If one finds a range where the results are no more sensitive to the change in “prior width”, it may be a good argument for the validity of the priors.