Hello!
I am quite new to Stan, and I am trying to self-train with some exercises (that might apply to my research in general).
As an example, I have created a list in R with:
- N = Number of sites
- X = Number of pig bones found in a site
- T = Total bones (of any animal) found in a site,
- radiocarbon = The date in years to which this context is dated (ranging from -6000 BC to - 3000).
- radiocarbon_sigma = A measurement error for the date.
# Number of sites
N <- 500
# Generate fake radiocarbon dates (years) with measurement errors
radiocarbon <- floor(runif(N, -6000, -3000))
sigma <- floor(runif(N, min = 0, max = 100)) # Measurement errors
# Generate fake total number of bones for each site
total_bones <- round(runif(N, min = 100, max = 1000))
# Generate fake number of pig bones for each site (assuming proportion varies)
p_true <- runif(N, min = 0, max=1) # True proportion of pig bones
pig_bones <- rbinom(N, total_bones, p_true)
# Putting all this together
data_list <- list(
N = N,
X = pig_bones,
T = total_bones,
radiocarbon = radiocarbon,
sigma_radiocarbon = sigma
)
Now, ideally I would like to have a binomial model (or even better a beta-binomial model) to estimate the probability of finding pig bones at a certain calendar date, so I can then plot the posterior distribution to see the temporal variation in proportions along with a credible interval. I have tried the following code, but it is full of issues:
data {
int<lower=0> N; // Number of sites
int<lower=0> X[N]; // Number of pig bones
int<lower=0> T[N]; // Total number of bones
vector[N] radiocarbon; // Radiocarbon dates
vector<lower=0>[N] sigma_radiocarbon; // Measurement errors for radiocarbon dates
}
parameters {
real alpha; // Intercept
real beta; // Slope for radiocarbon dates
real date; // The predictor, i.e. the radiocarbon date
}
model {
vector[N] pbar;
for ( i in 1:N ) {
pbar[i] = alpha[i] + beta[i]*date[i];
pbar[i] = inv_logit(pbar[i]);
}
// Priors
alpha ~ normal(0, 1.5); // Prior for intercept
beta ~ normal(0, 1.5); // Prior for slope
// Likelihood
for (i in 1:N) {
X[i] ~ binomial(T[i], pbar[i]); // Binomial likelihood
date[i] ~ normal(radiocarbon[i], sigma_radiocarbon[i]);
}
}
Errors:
# Setting chain to 1 just for diagnostics
fit <- sampling(stan_model, data = data_list, chains = 1, iter = 1000, warmup = 500, cores = 2)
Chain 1: Rejecting initial value:
Chain 1: Error evaluating the log probability at the initial value.
Chain 1: Exception: normal_lpdf: Scale parameter is 0, but must be > 0! (in 'model70c11e48b364_c86e30e492361fb52409da14f1e4b216' at line 31)
...
I apologise-I know the code might be very messy, but I am trying to learn.
Thank you in advance to whoever is willing to help me improve this!
P.S. As an extra step if this model works I would like to add a varying effect because I might have more samples from the same site, so it would be good to take into account that source of variability as well. Would that be too difficult to add?