Nested ordered parameters

I have a tricky problem. I have a phylogenetic tree structure (the tree is known) with age estimates for each node, where every age is approximated with a Gamma distribution, with known shape and scale/rate. I want to constrain the distributions such that ordering is respected as only the means of the Gamma distributions are currently strictly ordered.

I only have working code for one leaf’s path to the root, assuming the data is ordered from leaf to root.

data {
  int<lower=0> N;
  vector[N] shape;
  vector[N] rate;
}

parameters {
  positive_ordered[N] age;
   }
   
model {
  for(n in 1:N){
    age[n] ~ gamma(shape[n], rate[n]);
    }
  }

This model works well, however, is there any good way to impose ordering in a tree structure? This is a possible (very small example) structure:

a < b < c
d < b < c
e < f < c
    g < c

Both a and d should be younger than b, e should be younger than f, and b, f, g should all be younger than c.

Here is an example dataset (R code) with reverse engineered parameters from FamilyTreeDNA’s website for Y-DNA haplogroups:

tibble::tribble(~parentname, ~name, ~shape, ~rate,
NA_character_, "I-Y15154", 55.4,  0.0420,
"I-Y15154",   "I-BY70273", 40.9, 0.0419,
"I-BY70273",  "I-FGC18186", 35.0, 0.0432,
"I-FGC18186", "I-Y59437", 27.7, 0.0365,
"I-Y59437",   "I-BY61123", 8.83, 0.0493,
"I-Y59437",   "I-FTC12859", 10.3, 0.0329,
"I-FGC18186", "I-FTB41323", 11.8, 0.0227,
"I-Y15154",   "I-Y17669", 17.8, 0.0139,
"I-Y17669",   "I-Y17667", 6.11, 0.0109,
"I-Y15154",   "I-BY115311", 16.8, 0.0242,
"I-BY115311", "I-Y88749", 10.3, 0.0319)
2 Likes

Do you want the margins of your final result to correspond to the original marginal Gammas? Or do you want them to gain information from one another? For example, suppose there is a rapid radiation somewhere on your tree, leading to a bunch of nodes that all have similar age estimates, with similar gamma distributions. If you believe those age estimates are independent, then you would want to constrain nodes whose parents and daughters (and grandparents and granddaughters) are also part of the same radiation (with similar age estimates) to a much tighter region in the middle of the gamma. On the other hand, if you believe that you essentially have a single estimate of the age of the radiation, with all of the relevant nodes receiving essentially the same non-independent gamma distribution, then you would want to preserve these highly similar marginal distributions for all of the nodes, even as you ensure the ordering.

Most obviously, if these gamma distributions are in fact known to be the margins of some unknown posterior that does respect the ordering constraint, and which you are trying to recreate, then you would not want to allow the ordering to modify the margins.

For what it’s worth, I tend to think that you do not want to treat these marginal estimates as independent–that would be like having an independent fossil calibration for each node. Of course if you do have something like that, then you can certainly leverage the date ranges associated with each fossil to produce a really well constrained tree! But let’s not figure out how to cross that bridge unless it’s the right one to cross.

Also tagging @maxbiostat

It is the right bridge to cross! The desired property of the output implies the “independence assumption”. The issue is that the originals marginals are impossible if the ordering of the random variables are imposed (except for very few cases with long times between its parent node and children nodes.) So my goal is really only to adjust the marginals, but to do that we need to sample from the joint with constraints. :)

A more brute force approach that does what I want to do is this:

library(tidyverse)
library(posterior)

tribble(~parentname, ~name, ~shape, ~rate,
        NA_character_, "I-Y15154", 55.4,  0.0420,
        "I-Y15154",   "I-BY70273", 40.9, 0.0419,
        "I-BY70273",  "I-FGC18186", 35.0, 0.0432,
        "I-FGC18186", "I-Y59437", 27.7, 0.0365,
        "I-Y59437",   "I-BY61123", 8.83, 0.0493,
        "I-Y59437",   "I-FTC12859", 10.3, 0.0329,
        "I-FGC18186", "I-FTB41323", 11.8, 0.0227,
        "I-Y15154",   "I-Y17669", 17.8, 0.0139,
        "I-Y17669",   "I-Y17667", 6.11, 0.0109,
        "I-Y15154",   "I-BY115311", 16.8, 0.0242,
        "I-BY115311", "I-Y88749", 10.3, 0.0319) %>% 
  mutate(index = row_number(),
         parent_index = match(parentname, name)) %>% 
  replace_na(list(parent_index = 0)) -> 
  tree_data

draws <- tree_data %>% 
  mutate(samples = map2(shape, rate, ~rgamma(n = 250000, shape = .x, rate = .y) %>% rvar())) %>% 
  unnest(samples) %>% 
  pull(samples) %>% 
  as_draws_df()


draws %>% 
  filter(`x[1]` > `x[2]`,
         `x[1]` > `x[8]`,
         `x[1]` > `x[10]`,
         `x[2]` > `x[3]`,
         `x[3]` > `x[4]`,
         `x[3]` > `x[7]`,
         `x[4]` > `x[5]`,
         `x[4]` > `x[6]`,
         `x[8]` > `x[9]`,
         `x[10]` > `x[11]`) %>% 
  as_draws_rvars() ->
  draws_filtered_rvars

as_draws_rvars(draws %>% sample_n(ndraws(draws_filtered_rvars$x[1]))) -> draws_new_length

tree_data %>% 
  mutate(constrained = 2023-draws_filtered_rvars$x,
         unconstrained = 2023-draws_new_length$x)

with the output:

> tree_data %>% 
+   mutate(constrained = 2023-draws_filtered_rvars$x,
+          unconstrained = 2023-draws_new_length$x)
# A tibble: 11 × 8
   parentname name       shape   rate index parent_index constrained unconstrained
   <chr>      <chr>      <dbl>  <dbl> <int>        <int>  <rvar[1d]>    <rvar[1d]>
 1 NA         I-Y15154   55.4  0.042      1            0   627 ± 162     704 ± 177
 2 I-Y15154   I-BY70273  40.9  0.0419     2            1  1002 ± 125    1047 ± 153
 3 I-BY70273  I-FGC18186 35    0.0432     3            2  1191 ± 101    1212 ± 137
 4 I-FGC18186 I-Y59437   27.7  0.0365     4            3  1343 ± 103    1263 ± 144
 5 I-Y59437   I-BY61123   8.83 0.0493     5            4  1845 ± 60     1844 ± 60 
 6 I-Y59437   I-FTC12859 10.3  0.0329     6            4  1712 ± 95     1710 ± 98 
 7 I-FGC18186 I-FTB41323 11.8  0.0227     7            3  1520 ± 133    1502 ± 151
 8 I-Y15154   I-Y17669   17.8  0.0139     8            1   910 ± 193     742 ± 304
 9 I-Y17669   I-Y17667    6.11 0.0109     9            8  1484 ± 201    1462 ± 227
10 I-Y15154   I-BY115311 16.8  0.0242    10            1  1323 ± 165    1327 ± 170
11 I-BY115311 I-Y88749   10.3  0.0319    11           10  1705 ± 96     1701 ± 101

In that case, I think the simplest option will be to iterate across the tips of the tree, and find the deepest node that you haven’t already included (on your first iteration, it will be the root; on subsequent iterations it will be the daughter of the last common ancestor), and then apply your method to the tree (edit: branch) from that tip to that internal node, with the constraint that the internal node is younger than its parent.

Out of curiosity, what sort of datastream produces data like these, with completely independent uncertainty quantification of the ages of all internal nodes?

As not all (if any?) details are published about the datastream I think I have to wait until the procedure is published. What I can say is that the Gamma distributions could be understood as approximations to a posterior predictive distribution, and hence are in that sense independent (but not subject to any further constraints other than that the means are correct from an ordering perspective).

I am not sure I follow. You mean in a Stan model, or like my brute force approach? The issue is that the brute force approach is very computationally expensive. If I could figure out on how to sample this in Stan it would be much more efficient.

Imagine a topology where the root is A, with a linear chain of descendant nodes (B, C, D, E), and where B is a branch point and also separately has the descendant chain (F, G, H), and where F is also a branch point with descendant chain (I, J)

Then if bounded ordered vectors were available, you could do

parameters{
  ordered[5] branch1; // for A, B, C, D, E
  ordered<lower = branch1[2]>[3] branch2; // for F, G, H
  ordered<lower = branch2[1]>[2] branch3; // for I, J
}

Since bounded ordered vectors are not currently available, you would unfortunately have to do something like

parameters{
  ordered[5] branch1; // for A, B, C, D, E
  real<lower = branch1[2]> F;
  real<lower = F> G;
  real<lower = G> H;
  real<lower = F> I;
  real<lower = I> J;
}

which would get pretty unwieldy! Not sure whether there’s a good way to keep the code more concise. Hopefully @spinkney has a good idea here!

Let’s see if this will work for you. I don’t understand how your output above works because x[11] is clearly greater than x[10]. Or maybe filter is removing those where that’s the case? Idk, I don’t use tidyverse. It doesn’t really matter though because you can reverse what I’ve done.

Let’s say the constraints are

         `x[1]` > `x[2]`,
         `x[1]` > `x[8]`,
         `x[1]` > `x[10]`,
         `x[2]` > `x[3]`,
         `x[3]` > `x[4]`,
         `x[3]` > `x[7]`,
         `x[4]` > `x[5]`,
         `x[4]` > `x[6]`,
         `x[8]` > `x[9]`,
         `x[10]` > `x[11]`

And you’ve told us that each one has a marginal gamma distribution with given scale and rate. There’s a neat trick here using the F distribution (see F-distribution - Wikipedia).

Ripping straight from wikipedia, if X_k \sim \Gamma(\alpha_k,\beta_k)\, Gamma distribution are independent, then \frac{\alpha_2\beta_1 X_1}{\alpha_1\beta_2 X_2} \sim \mathrm{F}(2\alpha_1, 2\alpha_2)

We can rewrite the above constraints so that the F distribution is truncated at 1 for the given X_1, X_2.

Then pass in all the indexes for your pairwise constraints and things seem to work well.

functions {
 real F_lpdf(real x, real d1, real d2) {
   real d1_half = 0.5 * d1;
   real d2_half = 0.5 * d2;
   
  return -lbeta(d1_half, d2_half) + d1_half * log(d1 / d2) + 
    (d1_half - 1) * log(x) - (d1_half + d2_half) * log1p(x * d1/d2); 
 }
 
 real F_lcdf(real x, real d1, real d2) {
   return log(inc_beta(0.5 * d1, 0.5 * d2, (d1 * x) / (d1 * x + d2)));
 }
 
}
data {
  int<lower=0> N;
  vector[N] shape;
  vector[N] rate;
  int<lower=1> P; // number of pairs
  array[P, 2] int P_index;
  int<lower=1, upper=N> max_index;
}
transformed data {
  vector[P] pairs_coef;
  
  for (i in 1:P) {
    pairs_coef[i] = (shape[P_index[i, 2]] * rate[P_index[i, 1]]) / (shape[P_index[i, 1]] * rate[P_index[i, 2]]);
  }
  
}
parameters {
  real<lower=0> max_age;
   vector<lower=0, upper=pairs_coef>[P] ratio_age;
}
transformed parameters {
  vector[P] ratio_age_raw = ratio_age ./ pairs_coef;
  }
model {
 max_age ~ gamma(shape[max_index], rate[max_index]); 

 for (i in 1:P) {
  target += F_lpdf(ratio_age[i] | 2 * shape[P_index[i, 1]], 2 * shape[P_index[i, 2]]) ;
  target += F_lcdf(pairs_coef[i] | 2 * shape[P_index[i, 1]], 2 * shape[P_index[i, 2]]);
 }
 
}
generated quantities {
  vector[N] ages;
  ages[max_index] = max_age;

  int p = 2;
  for (i in 1:P) {
    ages[p] = ages[P_index[i, 2]] * ratio_age_raw[i];
    p += 1;
  }
}

The corresponding R code

library(tidyverse)
library(posterior)

tribble(~parentname, ~name, ~shape, ~rate,
        NA_character_, "I-Y15154", 55.4,  0.0420,
        "I-Y15154",   "I-BY70273", 40.9, 0.0419,
        "I-BY70273",  "I-FGC18186", 35.0, 0.0432,
        "I-FGC18186", "I-Y59437", 27.7, 0.0365,
        "I-Y59437",   "I-BY61123", 8.83, 0.0493,
        "I-Y59437",   "I-FTC12859", 10.3, 0.0329,
        "I-FGC18186", "I-FTB41323", 11.8, 0.0227,
        "I-Y15154",   "I-Y17669", 17.8, 0.0139,
        "I-Y17669",   "I-Y17667", 6.11, 0.0109,
        "I-Y15154",   "I-BY115311", 16.8, 0.0242,
        "I-BY115311", "I-Y88749", 10.3, 0.0319) %>% 
  mutate(index = row_number(),
         parent_index = match(parentname, name)) %>% 
  replace_na(list(parent_index = 0)) -> 
  tree_data

mod_order <- cmdstan_model("order_constraint.stan")

P_index <- array(dim = c(10, 2))
j <- 2
for (i in 1:10){
  P_index[i, ] <- as.matrix(tree_data[j, 5:6])
  j <- j + 1
}

d <- list(
  N = nrow(dat),
  shape = tree_data$shape,
  rate = tree_data$rate,
  P = 10,
  P_index = P_index,
  max_index = 1
)

mod_out <- mod_order$sample(
  data = d,
  iter_warmup = 1000,
  iter_sampling = 1000,
  parallel_chains = 4
)
mod_out$summary("ages")

And the output

# A tibble: 11 × 10
   variable  mean median    sd   mad     q5   q95  rhat ess_bulk ess_tail
   <chr>    <num>  <num> <num> <num>  <num> <num> <num>    <num>    <num>
 1 ages[1]  1319.  1314. 178.  178.  1041.  1620.  1.00    4753.    2785.
 2 ages[2]   956.   941. 216.  221.   628.  1331.  1.00    4343.    2895.
 3 ages[3]   743.   724. 208.  210.   434.  1108.  1.00    3511.    3017.
 4 ages[4]   598.   573. 193.  189.   320.   947.  1.00    4349.    2961.
 5 ages[5]   146.   132.  77.4  65.0   54.7  297.  1.00    3644.    2907.
 6 ages[6]   255.   229. 129.  109.    95.9  503.  1.00    4382.    2693.
 7 ages[7]   458.   436. 186.  186.   203.   813.  1.00    4142.    2864.
 8 ages[8]  1051.  1044. 228.  225.   683.  1434.  1.00    5203.    3113.
 9 ages[9]   474.   432. 230.  215.   174.   911.  1.00    4127.    2504.
10 ages[10]  705.   683. 219.  210.   389.  1107.  1.00    3389.    1602.
11 ages[11]  339.   307. 167.  150.   131.   665.  1.00    3663.    2251.

And let’s check the constraints

ages <- mod_out$draws("ages", format = "matrix")
for (i in 1:10) {
  print(table(ages[, P_index[i, 1]] < ages[, P_index[i, 2]]))
}
TRUE 
4000 

TRUE 
4000 

TRUE 
4000 

TRUE 
4000 

TRUE 
4000 

TRUE 
4000 

TRUE 
4000 

TRUE 
4000 

TRUE 
4000 

TRUE 
4000 
3 Likes

@spinkney suggested this code on Slack, after some suggestions from @jsocolar, where the tree topology is fixed:

functions {
  vector ordered_lb_lp (vector y, real lb) {
    int N = rows(y);
    vector[N] x;
    
    x[1] = exp(y[1]) + lb;
    target += exp(y[1]);
    
    for (i in 2:N) {
      x[i] = exp(y[i]) + x[i - 1];
      target += exp(y[i]);
    }
    
    return x;
  }

  vector ordered_ub_lp (vector y, real ub) {
    int N = rows(y);
    vector[N] x;
    
    x[1] = ub - exp(y[1]);
    target += exp(y[1]);
    
    for (i in 2:N) {
      x[i] = x[i - 1] - exp(y[i]);
      target += exp(y[i]);
    }
    
    return x;
  }
  
  vector ordered_lb_ub_lp (vector y, real lb, real ub) {
    int N = rows(y);
    vector[N] x;
    
    x[1] = lb + (ub - lb) * inv_logit(y[1]);
    target += log(ub - lb) + log_inv_logit(y[1]) + log1m_inv_logit(y[1]);
    
    for (i in 2:N) {
      x[i] = x[i - 1] + (ub - x[i - 1]) * inv_logit(y[i]);
      target += log(ub - x[i - 1]) + log_inv_logit(y[i]) + log1m_inv_logit(y[i]);
    }
    
    return x;
    
  }
}
data {
  int<lower=0> N;
  vector[N] shape;
  vector[N] rate;
}
parameters {
  real<lower=0> x1;
  vector[4] branch1_raw;
  vector[2] branch2_raw;
  vector[2] branch3_raw;
  real x7_raw;
  real x6_raw;
}
transformed parameters {
  vector[4] branch1 = reverse(ordered_lb_ub_lp(branch1_raw, 0, x1));
  vector[2] branch2 = reverse(ordered_lb_ub_lp(branch2_raw, 0, x1));
  vector[2] branch3 = reverse(ordered_lb_ub_lp(branch3_raw, 0, x1));
  real x6 = branch1[3] * inv_logit(x6_raw);
  real x7 = branch1[2] * inv_logit(x7_raw);
  vector[N] ages;
  
  ages[1] = x1;
  ages[2:5] = branch1;
  ages[6] = x6;
  ages[7] = x7;
  ages[8:9] = branch2;
  ages[10:11] = branch3;
}
model {
  ages ~ gamma(shape, rate);
  target += log(branch1[2]) + log_inv_logit(x7_raw) + log1m_inv_logit(x7_raw);
  target += log(branch1[3]) + log_inv_logit(x6_raw) + log1m_inv_logit(x6_raw);
}

With the help of ChatGPT I managed to generalize it like this:

data {
  int<lower=0> N;
  vector[N] shape;
  vector[N] rate;
  array[N] int parent_index; // Additional input data
}
parameters {
  real<lower=0> x_root;
  vector[N - 1] x_raw;
}
transformed parameters {
  vector[N] ages;
  ages[1] = x_root;
  
  for (n in 2 : N) {
    if (parent_index[n] == 1) {
      ages[n] = x_root * inv_logit(x_raw[n - 1]);
    } else {
      ages[n] = ages[parent_index[n]] * inv_logit(x_raw[n - 1]);
    }
  }
}
model {
  ages ~ gamma(shape, rate);
  for (n in 2 : N) {
    target += log(ages[parent_index[n]]) + x_raw[n - 1] - 2 * log1p_exp(x_raw[n - 1]);
  }
}

The conversation for those interested: Generalized Stan Code for Trees

The only thing I did after the last suggestion from ChatGPT was to fix the array syntax with stanc and remove the function in the beginning which didn’t end up in the final model anyway. EDIT: I reformulated the Jacobian adjustment a little bit.

This model samples well for quite large trees. Thank you, @spinkney and @jsocolar!

5 Likes

With a bit of extra data work you can vectorize that entire thing. I haven’t tested this but something like

transformed data {
 int total_parents = 0;
 int total_children = 0;
 
  for (n in 2 : N) {
  if (parent_index[n] == 1)  total_parents += 1;
    else total_children += 1;
  }
 
  array[total_parents] parent1_index;
  array[total_children] children_index;

  int count1 = 0, count2 = 0;
  for (n in 2 : N) {
    if (parent_index[n] == 1) {
      count1 += 1;
      parent1_index[count1] = n;
    } else {
      count2 += 1;
      children_index[count2] = n;
    }
  }
}
...

transformed parameters {
  vector[N] ages;
  ages[1] = x_root;
  
  ages[parent1_index] = x_root * inv_logit(x_raw[1:total_parents]);
  ages[children_index] = ages[parent_index[children_index]] * inv_logit(x_raw[total_parents + 1:N - 1]);
}
model {
  ages ~ gamma(shape, rate);
  target += log(ages[parent_index[2:N]]) + x_raw[1:N - 1] - 2 * log1p_exp(x_raw[1:N - 1]);
}

I was curious to try to vectorize the model after your suggestion. Unfortunately it doesn’t seem to work. Here is a reproducible example in R with cmdstanr (data and model):

library(cmdstanr)

"data {
  int<lower=0> N; // Number of data points
  int<lower=0> P; // number of childrens to root
  int<lower=0> C; // number of nodes below childrens to root
  array[N] int parent_index;
  vector<lower=0>[N] mean;
  vector<lower=0>[N] sd;
  int<lower=0, upper=1> use_gamma;
  int<lower=0, upper=1> vectorize;
}
transformed data {
  vector[N] meanlog;
  vector<lower=0>[N] sdlog;
  vector<lower=0>[N] shape;
  vector<lower=0>[N] rate;
  meanlog = log(mean.^2 ./ sqrt(mean.^2+sd.^2));
  sdlog = sqrt(log(1+(sd./mean).^2));
  shape = pow(mean, 2) ./ pow(sd, 2);
  rate = mean ./ pow(sd, 2);
}
parameters {
  real log_x_root;
  vector[N - 1] x_raw;
}
transformed parameters {
  vector[N] ages;
  if (vectorize == 1) {
    // First row (i = 1): root
    // The next P: 2 : (P+1): immidiate childrens of root
    // The rest: (P+2):N
    ages[1] = exp(log_x_root);
    ages[2 : (P + 1)] = exp(log_x_root) * inv_logit(x_raw[1 : P]);
    ages[(P + 2) : N] = ages[parent_index[(P + 2) : N]] .* inv_logit(x_raw[(P + 1) : (N - 1)]);
  } else {
    ages[1] = exp(log_x_root);
    for (n in 2 : N) {
      if (parent_index[n] == 1) {
        ages[n] = exp(log_x_root) * inv_logit(x_raw[n - 1]);
      } else {
        ages[n] = ages[parent_index[n]] * inv_logit(x_raw[n - 1]);
      }
    }
  }
}
model {
  target += log_x_root + log(ages[parent_index[2 : N]]) + x_raw[1 : (N - 1)] - 2 * log1p_exp(x_raw[1 : (N - 1)]); 
  // Jacobian for inv_logit and log for immidate childrens of root
  if (use_gamma == 1) {
    ages ~ gamma(shape, rate);
  } else {
    ages ~ lognormal(meanlog, sdlog);
  }
}" -> 
  stan_code


list(N = 11L, 
     P = 3L, 
     C = 7L, 
     parent_index = c(0L, 1L, 1L, 1L, 2L, 3L, 4L, 6L, 6L, 9L, 9L), 
     mean = c(1304.8, 669.4, 1019.9, 1234.8, 304.7, 795.4, 507.7, 494.1, 743, 167.4, 296), 
     sd = c(177.6, 169.6, 119, 303.1, 100.9, 137.3, 228.2, 151.6, 144.6, 60.6, 97.9), 
     use_gamma = 1, # seems to be faster in much larger trees
     vectorize = 0) -> # works
  stan_data_no_vectorize

list(N = 11L, 
     P = 3L, 
     C = 7L, 
     parent_index = c(0L, 1L, 1L, 1L, 2L, 3L, 4L, 6L, 6L, 9L, 9L), 
     mean = c(1304.8, 669.4, 1019.9, 1234.8, 304.7, 795.4, 507.7, 494.1, 743, 167.4, 296), 
     sd = c(177.6, 169.6, 119, 303.1, 100.9, 137.3, 228.2, 151.6, 144.6, 60.6, 97.9), 
     use_gamma = 1, # seems to be faster in much larger trees
     vectorize = 1) -> # doesn't work
  stan_data_vectorize

stan_code |> 
  write_stan_file() |> 
  cmdstan_model(compile = T, 
                cpp_options = list(STAN_CPP_OPTIMS = TRUE,
                                   stan_threads = TRUE), 
                stanc_options = list("O1")) ->
  stan_model


stan_model$sample(data = stan_data_no_vectorize, threads_per_chain = 1) -> sampling_works
stan_model$sample(data = stan_data_vectorize, threads_per_chain = 1, chains = 1) -> sampling_fails

I precalculate N (total number of nodes), P (number of immidate childrens to the root) and C (number of the remaining after root and P).

When sampling fails I get this errors:

> stan_model$sample(data = stan_data_vectorize, threads_per_chain = 1, chains = 1) -> sampling_fails
Running MCMC with 1 chain, with 1 thread(s) per chain...

Chain 1 Rejecting initial value:
Chain 1   Error evaluating the log probability at the initial value.
Chain 1 Exception: gamma_lpdf: Random variable[8] is nan, but must be positive finite! (in '/mnt/c/Users/staff/AppData/Local/Temp/RTMPEI~1/model-2fd04f4769c3.stan', line 49, column 4 to column 30)
Chain 1 Exception: gamma_lpdf: Random variable[8] is nan, but must be positive finite! (in '/mnt/c/Users/staff/AppData/Local/Temp/RTMPEI~1/model-2fd04f4769c3.stan', line 49, column 4 to column 30)
... and repeat
Chain 1 Initialization between (-2, 2) failed after 100 attempts. 
Chain 1  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
Chain 1 Initialization failed.
Warning: Chain 1 finished unexpectedly!

Warning message:
No chains finished successfully. Unable to retrieve the fit. 

Any idea what is going wrong?

The issue is that this can’t be fully vectorized. Each level of the tree can be vectorized but we need to run each level before the next.

With your data you can precalculate the number of levels and the number of nodes in each level. Because you calculate the top level we just need an array that stores the lengths of all the other levels. In this case, there are 3 other levels with lengths 3, 2, 2.

The input data now is:

list(N = 11L, 
     P = 3L, 
     C = 7L, 
     parent_index = c(0L, 1L, 1L, 1L, 2L, 3L, 4L, 6L, 6L, 9L, 9L), 
     L_minus_one = 3,
     levels_length = c(3, 2, 2),
     mean = c(1304.8, 669.4, 1019.9, 1234.8, 304.7, 795.4, 507.7, 494.1, 743, 167.4, 296), 
     sd = c(177.6, 169.6, 119, 303.1, 100.9, 137.3, 228.2, 151.6, 144.6, 60.6, 97.9), 
     use_gamma = 1, # seems to be faster in much larger trees
     vectorize = 1) -> # doesn't work
  stan_data_vectorize

The stan code slices within each level.

data {
  int<lower=0> N; // Number of data points
  int<lower=0> P; // number of childrens to root
  int<lower=0> C; // number of nodes below childrens to root
  array[N] int parent_index;
  int<lower=0> L_minus_one;
  array[L_minus_one] int levels_length;
  vector<lower=0>[N] mean;
  vector<lower=0>[N] sd;
  int<lower=0, upper=1> use_gamma;
  int<lower=0, upper=1> vectorize;
}
transformed data {
  vector[N] meanlog;
  vector<lower=0>[N] sdlog;
  vector<lower=0>[N] shape;
  vector<lower=0>[N] rate;
  meanlog = log(mean.^2 ./ sqrt(mean.^2+sd.^2));
  sdlog = sqrt(log(1+(sd./mean).^2));
  shape = pow(mean, 2) ./ pow(sd, 2);
  rate = mean ./ pow(sd, 2);
}
parameters {
  real log_x_root;
  vector[N - 1] x_raw;
}
transformed parameters {
  vector[N] ages;
  if (vectorize == 1) {
    // First row (i = 1): root
    // The next P: 2 : (P+1): immidiate childrens of root
    // The rest: (P+2):N
    ages[1] = exp(log_x_root);
    ages[2 : (P + 1)] = exp(log_x_root) * inv_logit(x_raw[1 : P]);
    {
      int start = P + 2;
      for (i in 1:L_minus_one) {
        int end = start + levels_length[i] - 1;
        ages[start : end] = ages[parent_index[start : end]] .* inv_logit(x_raw[(start - 1) : (end - 1)]);
        start = end + 1;
      }
    }
  } else {
    ages[1] = exp(log_x_root);
    for (n in 2 : N) {
      if (parent_index[n] == 1) {
        ages[n] = exp(log_x_root) * inv_logit(x_raw[n - 1]);
      } else {
        ages[n] = ages[parent_index[n]] * inv_logit(x_raw[n - 1]);
      }
    }
  }
}
model {
  target += log_x_root + log(ages[parent_index[2 : N]]) + x_raw[1 : (N - 1)] - 2 * log1p_exp(x_raw[1 : (N - 1)]); 
  // Jacobian for inv_logit and log for immidate childrens of root
  if (use_gamma == 1) {
    ages ~ gamma(shape, rate);
  } else {
    ages ~ lognormal(meanlog, sdlog);
  }
}
2 Likes

As I suspected after thinking through the problem today. I guess it should work to parallelise the chains (with reduce_sum) as well with this slicing?

This model can be further simplified like this:

data {
  int<lower=0> N;
  vector[N] shape;
  vector[N] rate;
  array[N] int parent_index;
}
parameters {
  vector[N] x_raw;
}
transformed parameters {
  vector[N] ages;
  ages[1] = exp(x_raw[1]);
  
  for (n in 2 : N) {
      ages[n] = ages[parent_index[n]] * inv_logit(x_raw[n]);
  }
}
model {
  ages ~ gamma(shape, rate);
  target += sum(x_raw);
  for (n in 2 : N) {
    target += log(ages[parent_index[n]]) - 2 * log1p_exp(x_raw[n]);
  }
}

Then we have a set of parameters that are on the real domain, and quite close to zero. A good setup for successful sampling!