# How to specify the mean and variance of a lognormal in a vectorized fashion?

Hello. I am transitioning from JAGS to Stan and trying to understand how to specify the mean and variance of a lognormal in a vectorized fashion. I copy-paste my code below. I think I should perform the vectorization in the “transformed parameters” chunk, right? Am I specifying the hierarchical mean and variance correctly? (@martinmodrak maybe you can help me here) Thank you!

``````data {
// N
int<lower=1> n_obs; // number of clusters
int<lower=1> n_region; // number of regions

int<lower=1,upper=n_region> region_id[n_obs]; // region id

int<lower=0> pop_count[n_obs]; // pop count per cluster
vector<lower=0> [n_obs]building_count; // building count per cluster
}

parameters {
vector<lower=0> [n_obs] pop_density; // pop density per cluster

real mu_alpha_global; // global mean
real<lower=0> sigma_alpha_global; //national variance

vector [n_region] mu_alpha_region; //region mean
real<lower=0> sigma_alpha_region; //region variance
vector [n_region] mu_sigma_region; //region mean variance
real<lower=0> sigma_sigma_region; //region type variance variance
}

transformed parameters {
vector [n_region] alpha_region; //region alpha
vector<lower=0> [n_region] sigma_region; //region sigma

alpha_region = mu_alpha_global + mu_alpha_region*sigma_alpha_region; //vectorized region alpha estimation
sigma_region = sigma_alpha_global + mu_sigma_region*sigma_sigma_region; //vectorized region sigma estimation
}

model {
//hyperpriors
mu_alpha_global ~ normal(0,1);
sigma_alpha_globa; ~ normal(0,1);

mu_alpha_region ~ normal(0,1);
sigma_alpha_region ~ normal(0,1);

mu_sigma_region ~ normal(0,1);
sigma_sigma_region ~ normal(0,1);

// hierarchical alpha and sigma per region
pop_density ~ lognormal(alpha_region[region_id], sigma_region[region_id]);
// population total
pop_total ~ poisson(pop_density  .* building_count);
}``````
2 Likes

Hi,
I am not sure I understand what the model aims to do, but it looks like it is going roughly in a sensible direction towards a Poisson-LogNormal model defined via latent variables (data augmentation). I’ve noticed some people prefer Poisson-LogNormal to Neg. binomial and similar (which do not require latent parameters), but I admit I have no idea where each of those distribution works better.

The difference between computing a variable in `transformed parameters` as opposed to `model` is mostly that `transformed parameters` get stored in the fit while local variables in the model block do not. So for many use cases it doesn’t matter that much where you compute your values.

Usually when one models sigma (or any other parameter bound to be positive) as varying, one would model the logarithm of sigma. Your code currently does not ensure that `sigma_region` is going to be positive for all parameter values, which is likely to lead to problems when sampling.

It seems that the variable `n_train` is not defined in the code but used at few places, so not sure what to make of it. But if `region_id` and `region_id_train` are supposed to be the same thing, than this looks sensible…

In any case a good way to check that your model is to test if you can recover parameters from simulated data.

Best of luck with transitioning to Stan - it can be inimidating, but I hope you’ll find it worth the effort…

1 Like

I am using this model — https://www.pnas.org/content/117/39/24173 — as a baseline for a similar study. First off, I just edited my question because I mistakenly used `n_train` instead of `n_obs` and `region_id_train` instead of `region_id` n some parts. I was trying to simplify the original model for this question but forgot to rename some parameters.

I found it hard to understand the difference between `transformed parameters` and `model` but now it is clear. Also, I have noticed that in some iterations I had a negative `sigma_region`. I will model it using the logarithm. Thanks for the hint!

I am currently working with simulated data so it will be straightforward to check if the model works!

Hi @martinmodrak ,
Would that mean that bounding the support of sigma to be above zero won’t ensure sigma to be positive?
Thks!

This is a good question and a point of frequent confusion. In general, Stan does very little magic with your model, because the code is treated as just a procedure to compute the target density given the parameters and Stan does not “understand” the model in any way. This absence of “understanding” actually allows Stan to be very flexible and fast but may be surprisng to people coming from e.g. JAGS which AFAIK actually analyses and in some sense “understands” the model code.

Constraints on variables in Stan actually do two different things depending on the program block. When using constraints in the `parameters` block, Stan automatically transforms parameters from unconstrained scale to create values that satisfy the constraints (in `parameters`, Stan is in control of the values and the user can’t change them). Constraints are also valid in the `transformed parameters` and `generated quantities` blocks, but there they serve only as checks - here, the user is responsible for computing the values based on parameters. Stan’s contribution is that at the end of the block, it checks that the computed values satisfy the constraints and Stan will throw an error if this is not the case.

So for computed values, the user is responsible for making sure the constraints are met - there are multiple ways to do this and modelling the logarithm is just one that was frequently found useful.

Hope that clarifies more than confuses :-)

1 Like

It makes sense. Thank you for the clarification. If I understand correctly, I should change the part in the `transformed parameters` as follows? After testing it, it works but not sure if it is the proper way to do it…

`````` transformed parameters {
vector [n_region] alpha_region; //region alpha
vector<lower=0> [n_region] sigma_region; //region sigma
vector<lower=0> [n_region] exp_sigma_region; //exp region sigma

alpha_region = mu_alpha_global + mu_alpha_region*sigma_alpha_region; //vectorized region alpha estimation
exp_sigma_region = exp(sigma_alpha_global + mu_sigma_region*sigma_sigma_region); //vectorized region     sigma estimation

sigma_region=log(exp_sigma_region)
}``````

My suggestion was more like (keeping only the stuff related to sigma):

``````parameters {
real sigma_alpha_global; //no bounds here
vector [n_region] mu_sigma_region;
real<lower=0> sigma_sigma_region;
}

transformed parameters {
vector<lower=0> [n_region] sigma_region; //region sigma

sigma_region = exp(sigma_alpha_global + mu_sigma_region*sigma_sigma_region); //exp ensures the result is positive for all parameter values
}
``````

It is really exactly the same thing as the “log link” you see when people use for e.g. Poisson response.

Does that make sense?

1 Like

Thank you, Martin. It works!

1 Like