Fitting a distribution across multiple sites without using a for loop

aeiche01 · April 23, 2024, 8:54pm

There is probably a really easy solution to this, but I’m coming up short. I have code where I fit a Pareto distribution to data, and I’m looking to find out what the alpha parameter is for the distribution:

data {
	int<lower=0> N;
	real<lower=0> x_min;
	vector<lower=0>[N] x;
}

parameters {
	// Pareto density
	real<lower=0, upper=5> alpha;
}

model {
	// Prior: Pareto density
	alpha ~ lognormal(1, 1) T[0, 5];
	// x_min uniform on its interval.
	
	// Likelihood: Pareto density
	x ~ pareto(x_min, alpha);
}

This works well for a single measured site, where

   
          N<-length(data$x)
          x<-data$x
          x_min<-x_min
          
          stan_dat<-list(N=N,x=x,x_min=x_min)

However, I have more than one site (so, data$site_label exists). I could just fit a distribution to each site in a for loop in R and get the outputs individually, but is there a way to do all that in just one model? How would I code that?

Corey.Plate · April 23, 2024, 9:48pm

Whether you implement it with matrix operations or a for loop, as long as each term is incrementing the log posterior density, it will all be instantiated as one model. If you wanted to use the data from each site to constrain the parameters from each other site, you could use a hierarchical noncentered model by choosing some hyperpriors for the alpha distribution such that, for example:

data{

 int<lower=0> N;
 int<lower=0> nSites;
 vector<lower=0>[nSites] x_min;
 array[nSites] vector<lower=0>[N] x;

}

parameters {

 real alphaMu;
 real<lower=0> alphaSigma;

 vector<lower=0, upper=5>[nSites] alpha;

}

model{

 alphaMu ~ normal(0,1);
 alphaSigma ~ lognormal(0,1);

 alpha ~ lognormal(alphaMu, alphaSigma) T[0,5];

 for (site in 1:nSites) {
  target += pareto_lcdf(x[site] | x_min[site], alpha[site])
 }

}

However, I do suspect you’ll run into some computational issues selecting a lognormal distribution and bounding it (0,5), regardless of the mu and sigma parameters the model finds for the alpha distribution.

Without hierarchical modeling:

data{

 int<lower=0> N;
 int<lower=0> nSites;
 vector<lower=0>[nSites] x_min;
 array[nSites] vector<lower=0>[N] x;

}

parameters {

vector<lower=0, upper=5>[nSites] alpha;

}

model{

 alpha ~ lognormal(1, 1) T[0,5];

 for (site in 1:nSites) {
  target += pareto_lcdf(x[site] | x_min[site], alpha[site])
 }

}

aeiche01 · April 23, 2024, 10:17pm

Ah, thank you! That second one without hierarchical modeling is definitely the one I was looking for. I was using an external for loop to the modeling process (meaning, rerunning the model independently to a different site each time I wanted to look at a new site) so this version is perfect.

I’m not familiar with making an array to prep for that code, is this how you do it?

# Prepare data for Stan
stan_data <- list(
  N = nrow(df),                   # Number of observations
  nSites = length(unique(df$site)),  # Number of sites
  x_min = df$x_min,               # Minimum values of 'x' for each site
  x = split(df$x, df$site)        # Split 'x' by site into a list
)

# Convert 'x' to array
stan_data$x <- lapply(stan_data$x, as.array)

# Convert list to array
stan_data$x <- array(unlist(stan_data$x), dim = c(length(stan_data$x[[1]]), length(stan_data$x)))

Corey.Plate · April 23, 2024, 10:23pm

You would use a structural variable where each entry in the array was a cell, then save the structural variable as a .json file to use as the input file for Stan.

If this were MATLAB, you would have:

Var.N
Var.nSites
Var.x_min
Var.x{site}

jsonwrite(Var,‘filename’) where jsonwrite was your .json writing function and ‘filename’ was the name of the output file. This is my handwritten MATLAB function for that:

function [] = jsonwrite(var,filename)

%used to write a structured variable to a json file

%turn the data matrix into its json equivalent

jsf = jsonencode(var);

%prepare the file name

filesave = strcat(filename,'.json');

%begin a file with writing permission

fid = fopen(filesave,'w');

%write to text file

fprintf(fid,'%s',jsf);

%close the file

fclose(fid);

end```

aeiche01 · April 23, 2024, 11:16pm

One thing that occurs to me: I don’t have the same number of data points x for each site. Wouldn’t an array need to have the same number of data points for each site?

Corey.Plate · April 24, 2024, 1:18am

You would have to pad your data. Stan does not handle ragged data sets.

Topic		Replies	Views
Fitting Pareto distribution with group effect Modeling specification	3	961	February 2, 2022
Double-Pareto Lognormal Distribution in Stan Modeling	31	2967	June 7, 2024
Beta Distribution HMC sampling, Divergent transition, Bulk and Tail ESS Modeling rstan , specification	2	59	December 10, 2024
Different estimation of parameters when sampling from generalized pareto in loop Modeling fitting-issues	1	482	June 22, 2020
Initialization between (-2, 2) failed after 100 attempts Modeling fitting-issues	3	597	January 30, 2021

Fitting a distribution across multiple sites without using a for loop

Related topics