Hi all, I have a dataset that overall follows a Pareto or power law distribution, but the exponent \alpha can vary by group. I am assuming that I would want to specify group as a fixed effect. I was not sure how to specify this model. So far I have Stan code to fit the Pareto distribution but I don’t know how to allow \alpha to vary by group within a single model.

Here is some example data I generated. The x_{min} parameter for each group is fixed at 1 and the \alpha for each group is drawn from a distribution centered around \alpha_{mean} = 2.

library(Pareto)
library(ggplot2)
library(dplyr)
set.seed(222)
alpha_mean <- 2
alpha_group <- rnorm(3, mean = alpha_mean)
dat <- data.frame(group = 1:3, x_min = 1, alpha = alpha_group) %>%
group_by(group, x_min, alpha) %>%
summarize(x = rPareto(10000, t = x_min, alpha = alpha))
# What does it look like?
ggplot(dat, aes(x = x, group = factor(group), fill = factor(group))) +
geom_histogram(position = 'dodge') +
scale_x_log10() + scale_y_log10(expand = c(0,0))

Histograms overlaid by group:

Below is the Stan code I have so far, which does not account for group-level variation in \alpha. Any help would be greatly appreciated.

data {
int<lower=0> N;
vector<lower=0>[N] x;
real<lower=0> x_min;
}
parameters {
// Pareto density
real<lower=0, upper=5> alpha;
}
model {
// Prior: Pareto density
alpha ~ lognormal(1, 1) T[0, 5];
// Likelihood: Pareto density
x ~ pareto(x_min, alpha);
}
generated quantities {
vector[N] log_lik; // Log-likelihood for getting info criteria later
for (i in 1:N) {
log_lik[i] = pareto_lpdf(x[i] | x_min, alpha);
}
}

Hi,
if you have a fixed effect for alpha and no other variables in the model, the fixed effects formulation is almost equivalent to fitting a separate model for each group - the only difference is in how you setup priors. So if you don’t need to make the model more complex further down the road, and if you don’t have strong opinion on how to setup the priors, I would just fit the model separately to each group.

Additionally hard bounds like real<lower=0, upper=5> alpha are usually not recommended - if you
believe alpha should not be higher than 5, it is usually better to express this as a “soft constraint” in your prior. E.g. a alpha ~ gamma(3, 1.5) will mimick your current setup in that it will put low mass below roughly 0.4 and above 5. Hard constrartins should usually be reserved for “physical constraints” - in this case the lower bound of 0 is necessary for the distribution to be defined and should thus be kept as hard constraint, while the upper bound is likely just an element of the prior.

Thanks very much for your input, it’s really helpful!

I see the rationale for why it is almost as good to fit a separate model for each group, but it just “feels wrong” to me in a way. Isn’t it capturing some kind of non-independence of the alpha coefficients for each group if you fit group as a random effect in one big model? Or am I misunderstanding?

To respond to your point about the truncation: yes I agree that isn’t ideal. I believe I copypasta’ed it from some Stan forum post or Stackoverflow post from years ago. I will look into better priors that don’t require that hacky truncation.

Also, I have gotten a reasonably well-functioning model coded up which fits an overall mean for alpha and a random effect for each group. Here is the Stan code. Other than the bad use of truncation, is this legitimate?

data {
int<lower=0> N;
int<lower=0> M; // number of groups
vector<lower=0>[N] x;
int<lower=1,upper=M> fg[N]; // Mapping to groups 1-M
real<lower=0> x_min;
}
parameters {
// Pareto density
real<lower=0, upper=5> alpha_mean;
vector[M] alpha_fg; // Deviation of each group from mean intercept
real<lower=0,upper=10> sigma_alpha; // Variation in intercepts
}
model {
// Prior: Pareto density
alpha_mean ~ lognormal(1, 1) T[0, 5];
alpha_fg ~ normal(0, sigma_alpha);
sigma_alpha ~ exponential(1);
// Likelihood: Pareto density
x ~ pareto(x_min, alpha_mean + alpha_fg[fg]);
}

I think that’s a model that could work in many circumstances, but not always. A couple points:

Yes, you are right fitting each group separately is often noticeably different than using a random/varying effect for group. Fitting separately is however almost the same as fitting a fixed effect for each group - which is what was mentioned in your first post.

The alpha parameter for the pareto distribution has to be positive, but your current model admits configurations where alpha_mean + alpha_fg[fg] < 0 for some fg - this may cause problems with initialization and if there is non-negligible posterior mass for alpha_mean + alpha_fg[fg] close to 0 for some group, it may also cause problems when fitting. The typical (but not the only) way to resolve this is to always model positive parameters on the log scale, i.e. you would have:

parameters {
real log_alpha_mean;
...
}
model {
log_alpha_mean ~ normal(1, 1);
...
x ~ pareto(x_min, exp(alpha_mean + alpha_fg[fg]);
}

(if you really want that upper bound, than it would translate to real<upper= log(5) log_alpha_mean).
This is a very similar model to the one you have, but now you have multiplicative random effects, which means the interpretation of sigma is quite different (and may justify a very different prior).

If you really don’t like the properties of working on the log scale, you can use log1p_exp instead of exp - this will still ensure that the result is positive and the pareto distribution is always defined, but log1p_exp(x) is approximately x when x gets far from zero so if all your alpha values are large, it will give almost the same answer as the model you posted, while preventing the potential issues as alpha gets close to zero.