Hi,
It’s my first time posting on this forum so, first, I would like to thank the development team for the great work. I started to use Stan a few days ago and, as a total newbie, surprisingly I find it easier to use than PyMC.
Thanks to the outstanding documentation I have already successfully fitted some simple hierarchical model. Now I need to do something a little unorthodox, so I would like to seek some advice.
I’m working on a collection of Chinese manuscripts dating from the end of the IVth century B.C. These documents contain a lot of dates, which constitute of the official name of the year, the month and the position of the day in the sexagenary cycle – a sequence of 60 terms used to reckon the days. Because the months follow the lunar cycle, their relation to the sexagenary cycle is not straightforward: some months have only 29 days and sometimes an extra month can be added at the end of the year, so it’s not possible to infer the position of the day in the month (lunar cycle) directly from its position in the sexagenary cycle.
In order to reconstruct the calendar that was used by the people who wrote these documents, I would like for each date to find the probability distribution of beta the distance between the beginning of the month and the date, that is to say, the position the date in the lunar cycle.
Thanks to the sexagenary cycle, it’s possible to infer precisely the time span between two dates; for example, we know that there’s exactly (60 - 15) + 60 (one full cycle) + 41 = 146 days between the two dates below:
- first date: The year of Song, the 4th month, the 15th day of the sexagenary cycle.
- second date: The year of Song, the 9th month, the 41st day of the sexagenary cycle.
My approach is to model this time span, y, as the linear function of alpha, the number of day between the first date and the end of the month it belongs, the product of the number of days in a month and the number of months between the two dates, x, and a Gaussian error for the number of days between the beginning of the last month and the second date. Then I would like to transform alpha to get the distribution beta.
I randomly generated 100 dates replicating the dates of the manuscripts and I tested my model using the following Stan programme:
data {
int<lower=0> N;
vector[N] x;
vector[N] y;
}
parameters {
real<lower=14, upper=22> alpha;
real<lower=0> sigma;
}
model {
sigma ~ normal(0, 5);
y ~ normal(x * 30 + alpha, sigma);
}
Unfortunately, this model doesn’t work. Even when I use informative priors, the Gaussian error absorbs alpha. How can I prevent that? Does my approach make sense? Do you know a better way to get the distribution of beta?
Many thanks,
Colin