Can you help me to fix this calendar?

Hi,

It’s my first time posting on this forum so, first, I would like to thank the development team for the great work. I started to use Stan a few days ago and, as a total newbie, surprisingly I find it easier to use than PyMC.

Thanks to the outstanding documentation I have already successfully fitted some simple hierarchical model. Now I need to do something a little unorthodox, so I would like to seek some advice.

I’m working on a collection of Chinese manuscripts dating from the end of the IVth century B.C. These documents contain a lot of dates, which constitute of the official name of the year, the month and the position of the day in the sexagenary cycle – a sequence of 60 terms used to reckon the days. Because the months follow the lunar cycle, their relation to the sexagenary cycle is not straightforward: some months have only 29 days and sometimes an extra month can be added at the end of the year, so it’s not possible to infer the position of the day in the month (lunar cycle) directly from its position in the sexagenary cycle.

In order to reconstruct the calendar that was used by the people who wrote these documents, I would like for each date to find the probability distribution of beta the distance between the beginning of the month and the date, that is to say, the position the date in the lunar cycle.

Thanks to the sexagenary cycle, it’s possible to infer precisely the time span between two dates; for example, we know that there’s exactly (60 - 15) + 60 (one full cycle) + 41 = 146 days between the two dates below:

  • first date: The year of Song, the 4th month, the 15th day of the sexagenary cycle.
  • second date: The year of Song, the 9th month, the 41st day of the sexagenary cycle.

My approach is to model this time span, y, as the linear function of alpha, the number of day between the first date and the end of the month it belongs, the product of the number of days in a month and the number of months between the two dates, x, and a Gaussian error for the number of days between the beginning of the last month and the second date. Then I would like to transform alpha to get the distribution beta.

I randomly generated 100 dates replicating the dates of the manuscripts and I tested my model using the following Stan programme:

data {
int<lower=0> N;
vector[N] x;
vector[N] y;
}

parameters {
real<lower=14, upper=22> alpha;
real<lower=0> sigma;
}

model {
sigma ~ normal(0, 5);
y ~ normal(x * 30 + alpha, sigma);
}

Unfortunately, this model doesn’t work. Even when I use informative priors, the Gaussian error absorbs alpha. How can I prevent that? Does my approach make sense? Do you know a better way to get the distribution of beta?

Many thanks,

Colin

As a start you might want to figure out what values alpha could take here. I’m not sure why there’s only one alpha parameter or how it fits into the plot you post… so maybe this is unorthodox enough it needs more explanation.

Sorry, I tried to synthesize but, obviously, I missed some steps. I will edit my question but let me first clarify a little bit.

As I said, my goal is to build a model to get the probability distribution of beta, the distance between the date and the first day of the month it belongs to, and so for each date in the documents. But, before doing that, I want to test different approaches to model alpha, the distance between the date and the last day of the month it belongs to.

My question regards the test model below intended to get the probability distribution of alpha for only one date (let’s call it the target date):

  • alpha ~ DiscreteUniform(0, 30)
  • sigma ~ Half-Normal(0, 20)
  • Y ~ Normal(alpha + x * 29, sigma)

Where:

  • N is the number of dates located forward in time.

  • Y is a vector of length N that contains the distance between the target date and the each of the forward dates.

  • X is a vector of length N that contains the number of month separating the target date and each of the forward dates.

  • 29 is the minimum number of days in a month.

Since I’m just running a test, for the sake of simplicity, in my stan code alpha has a continuous distribution.

When I run this model against a replicated sample it fails lamentably to find the actual value of alpha: for example, when the target date is the 15th day of the month, the 95% credible interval for alpha is [3.02, 6.88] and for sigma it’s [7.5, 10.36].

Even when I use an informative prior, as in my first question, alpha is absorbed by the gaussian error.

Do you know what is the problem?

For this one date have you plotted y as a function of x? What do you get?

Thanks to your question, I realized that my variable was actually uniformly distributed. Using the following model I got very good result:

data {
int<lower=0> N;
int<lower=0> M;
int target_month;
real<lower=0, upper=30> alpha_min;
real<lower=0, upper=30> alpha_max;
real<lower=0, upper=30> day_min;
real<lower=0, upper=30> day_max;
vector[N] ante;
vector[N] month_n;
int<lower=0, upper=12> month[N];
vector[N] total_distance;
vector[N] beta_min;
vector[N] beta_max;
}

transformed data {
int x;
real synodic_month = 1 / (1/27.322 - 1/365.25);
}

parameters {
real<lower=day_min, upper=day_max> day_prime;
real<lower=alpha_min, upper=alpha_max> alpha;
real<lower=0, upper=1> r[M];
}

model {

for (n in 1:N){
  if (ante[n] == 0){
    total_distance[n] - (month_n[n] * synodic_month) - r[target_month] - r[month[n]] - alpha ~ uniform(beta_min[n], beta_max[n]);
  }
  else {
    total_distance[n] - (month_n[n] * synodic_month) - r[target_month] + r2[month[n]] - day_prime ~ uniform(beta_min[n], beta_max[n]);
    }
  }
}

generated quantities {
real alpha_prime;
real real_day;
real lambda = sum(ante) / N;
alpha_prime = synodic_month - alpha;
real_day = log_mix(lambda, day_prime, alpha_prime);
}

Now I would like to do the same but for multiple day_prime and alpha. I wrote the following code:

parameters {
real<lower=0, upper=30> day_prime[D];
real<lower=0, upper=30> alpha[D];
real<lower=0, upper=1> r[M];
}

model {

for(d in 1:D){
    day_prime[d] ~ uniform(day_min[d], day_max[d]);
    alpha[d] ~ uniform(alpha_min[d], alpha_max[d]);
}

for(d in 1:D){
  for (n in 1:N){
    if (ante[n] == 0){
      total_distance[d, n] - (month_n[n] * synodic_month) - r[target_month] - r[month[n]] - alpha[d] ~ uniform(beta_min[n], beta_max[n]);
    }
    else {
      total_distance[d, n] - (month_n[n] * synodic_month) - r[target_month] + r[month[n]] - day_prime[d] ~ uniform(beta_min[n], beta_max[n]);
      }
  }
}

Unfortunately, this model always fails during initialization. I guess it’s because the support of day_prime and alpha is very “narrow”. In my first model I worked around this by setting very tight bound, but can I do the same for a vector or an array of variables where each variable has a different support and so need different bounds?

By the way, I pretty sure that my code is very ugly. Any advice on how I should improve it is warmly welcome!

Many thanks,

Colin

Solution found in the section 22.4 of the manual.

1 Like