How to specify the mean and variance of a lognormal in a vectorized fashion?

Hello. I am transitioning from JAGS to Stan and trying to understand how to specify the mean and variance of a lognormal in a vectorized fashion. I copy-paste my code below. I think I should perform the vectorization in the “transformed parameters” chunk, right? Am I specifying the hierarchical mean and variance correctly? (@martinmodrak maybe you can help me here) Thank you!

data {
  // N
  int<lower=1> n_obs; // number of clusters
  int<lower=1> n_region; // number of regions
  
  int<lower=1,upper=n_region> region_id[n_obs]; // region id

  int<lower=0> pop_count[n_obs]; // pop count per cluster
  vector<lower=0> [n_obs]building_count; // building count per cluster
}

parameters {
  vector<lower=0> [n_obs] pop_density; // pop density per cluster
  
  real mu_alpha_global; // global mean
  real<lower=0> sigma_alpha_global; //national variance
  
  vector [n_region] mu_alpha_region; //region mean
  real<lower=0> sigma_alpha_region; //region variance
  vector [n_region] mu_sigma_region; //region mean variance
  real<lower=0> sigma_sigma_region; //region type variance variance
}

transformed parameters {
    vector [n_region] alpha_region; //region alpha
    vector<lower=0> [n_region] sigma_region; //region sigma

    alpha_region = mu_alpha_global + mu_alpha_region*sigma_alpha_region; //vectorized region alpha estimation
    sigma_region = sigma_alpha_global + mu_sigma_region*sigma_sigma_region; //vectorized region sigma estimation
}

model {
  //hyperpriors
  mu_alpha_global ~ normal(0,1);
  sigma_alpha_globa; ~ normal(0,1);
  
  mu_alpha_region ~ normal(0,1);
  sigma_alpha_region ~ normal(0,1);
  
  mu_sigma_region ~ normal(0,1);
  sigma_sigma_region ~ normal(0,1);
  
  // hierarchical alpha and sigma per region
  pop_density ~ lognormal(alpha_region[region_id], sigma_region[region_id]);
  // population total
  pop_total ~ poisson(pop_density  .* building_count);
}
2 Likes

Hi,
I am not sure I understand what the model aims to do, but it looks like it is going roughly in a sensible direction towards a Poisson-LogNormal model defined via latent variables (data augmentation). I’ve noticed some people prefer Poisson-LogNormal to Neg. binomial and similar (which do not require latent parameters), but I admit I have no idea where each of those distribution works better.

The difference between computing a variable in transformed parameters as opposed to model is mostly that transformed parameters get stored in the fit while local variables in the model block do not. So for many use cases it doesn’t matter that much where you compute your values.

Usually when one models sigma (or any other parameter bound to be positive) as varying, one would model the logarithm of sigma. Your code currently does not ensure that sigma_region is going to be positive for all parameter values, which is likely to lead to problems when sampling.

It seems that the variable n_train is not defined in the code but used at few places, so not sure what to make of it. But if region_id and region_id_train are supposed to be the same thing, than this looks sensible…

In any case a good way to check that your model is to test if you can recover parameters from simulated data.

Best of luck with transitioning to Stan - it can be inimidating, but I hope you’ll find it worth the effort…

1 Like

Hi Martin — thank you for your quick reply!

I am using this model — https://www.pnas.org/content/117/39/24173 — as a baseline for a similar study. First off, I just edited my question because I mistakenly used n_train instead of n_obs and region_id_train instead of region_id n some parts. I was trying to simplify the original model for this question but forgot to rename some parameters.

I found it hard to understand the difference between transformed parameters and model but now it is clear. Also, I have noticed that in some iterations I had a negative sigma_region. I will model it using the logarithm. Thanks for the hint!

I am currently working with simulated data so it will be straightforward to check if the model works!

Hi @martinmodrak ,
Interesting answer.
Would that mean that bounding the support of sigma to be above zero won’t ensure sigma to be positive?
Thks!

This is a good question and a point of frequent confusion. In general, Stan does very little magic with your model, because the code is treated as just a procedure to compute the target density given the parameters and Stan does not “understand” the model in any way. This absence of “understanding” actually allows Stan to be very flexible and fast but may be surprisng to people coming from e.g. JAGS which AFAIK actually analyses and in some sense “understands” the model code.

Constraints on variables in Stan actually do two different things depending on the program block. When using constraints in the parameters block, Stan automatically transforms parameters from unconstrained scale to create values that satisfy the constraints (in parameters, Stan is in control of the values and the user can’t change them). Constraints are also valid in the transformed parameters and generated quantities blocks, but there they serve only as checks - here, the user is responsible for computing the values based on parameters. Stan’s contribution is that at the end of the block, it checks that the computed values satisfy the constraints and Stan will throw an error if this is not the case.

So for computed values, the user is responsible for making sure the constraints are met - there are multiple ways to do this and modelling the logarithm is just one that was frequently found useful.

Hope that clarifies more than confuses :-)

1 Like

It makes sense. Thank you for the clarification. If I understand correctly, I should change the part in the transformed parameters as follows? After testing it, it works but not sure if it is the proper way to do it…

 transformed parameters {
    vector [n_region] alpha_region; //region alpha
    vector<lower=0> [n_region] sigma_region; //region sigma
    vector<lower=0> [n_region] exp_sigma_region; //exp region sigma

    alpha_region = mu_alpha_global + mu_alpha_region*sigma_alpha_region; //vectorized region alpha estimation
    exp_sigma_region = exp(sigma_alpha_global + mu_sigma_region*sigma_sigma_region); //vectorized region     sigma estimation

sigma_region=log(exp_sigma_region)
}

My suggestion was more like (keeping only the stuff related to sigma):

parameters {
   real sigma_alpha_global; //no bounds here
   vector [n_region] mu_sigma_region;
   real<lower=0> sigma_sigma_region;
}

transformed parameters {
    vector<lower=0> [n_region] sigma_region; //region sigma

    sigma_region = exp(sigma_alpha_global + mu_sigma_region*sigma_sigma_region); //exp ensures the result is positive for all parameter values
}

It is really exactly the same thing as the “log link” you see when people use for e.g. Poisson response.

Does that make sense?

1 Like

Thank you, Martin. It works!

1 Like