New Stan user here. I’m working with palaeontological data, consisting of L fossil sites and their ages and T species of animals that can be located in two or more sites. The occurence data is a [L, T] matrix where a value of 1 or 0 of its [l, t] element denotes the presence or absence of a species in a site. I am attempting to obtain appearance and disappearance ages and uncertainties for each species.
What would be the best way to express the posterior likelihood? I think IRT models ([1.11 Item-response theory models | Stan User’s Guide]1.11 Item-response theory models | Stan User’s Guide) are somewhat conceptually similar.
The resulting parameter distributions are all flat, so I think the posterior is wrong. The Python script and test data I use is below. I’d be grateful for any guidance!
data {
int<lower=0> T; // number of taxa
int<lower=0> L; // number of sites
real<lower=0,upper=70> loc_ages[L]; // locality ages
int<lower=0,upper=2>occ_matrix[L, T]; // 0-1 matrix indicating occurence/absence, L rows, T cols
}
parameters {
real <lower=0,upper=70> a[T]; //first appearance ages of taxa
real <lower=0,upper=40> duration[T]; // durations of taxon existence
}
model {
int theta = 0;
for (l in 1:L) {
for (t in 1:T) {
if (a[t] <= loc_ages[l] <= a[t]+duration[t]) // locality dates are between a[t] and a[t]+duration[t]
{
theta = 1;
}
else {
theta = 0;
}
occ_matrix[l,t] ~ bernoulli_logit(theta);
}
}
}
Here’s the Python script I use. The occurence matrix with test data is here: 10test.txt (110 Bytes).
T = 5
L = 10
ages = [1., 5, 10, 20, 25, 30, 40, 50, 55, 60]
durations = [15, 15, 30, 35, 24]
mat = np.genfromtxt(“10test.txt”, dtype=int)
data={‘T’:T, ‘L’:L, ‘loc_ages’:ages, ‘occ_matrix’:mat}
fit = model.sampling(data=data, seed=420, iter=10000, chains=4, warmup=5000, thin=1)