Inconsistent fits/runtimes

Hi all,

I’m working with rstan, trying to fit a power law model as well as a power law model with an additional linear term (code for each model below), and I have several different data sets to fit to.

About 3/4 of the time, the fits are very quick (tens of seconds) and match the data well. However, some fits are obviously not converging to the data at all and hence giving unhelpful parameter values; and some fits are running for vastly longer than others (one stayed on the same set of 100 iterations for three hours before I gave up).

I’m calling stan_model() for each of the two models below, and then iterating through the data sets calling sampling() for each.
Since I’m just trying to get things working reliably at the moment, I’m only using 1000 iterations for each fit. The long runtimes seem to become more frequent the more chains I try to use.
I’ve also increased the max_treedepth to 15 because I was getting some warnings about the maximum treedepth being exceeded.

The pairs plots seem to show a pretty strong correlation between the parameters a and k, but I don’t know if that should be a red flag.

Is there anything I’m obviously doing wrong? Are there methods I can use to help nail down what’s causing the inconsistency?

Thanks,
Fane

Current priors (normal distribution):

a_dist <- c(mean = 4, sd = 5)
b_dist<- c(mean = 0.04, sd = 0.02)
k_dist <- c(mean = 0.4, sd = 0.2)
sigma_dist <- c(mean = 0.1, sd = 0.05)

Power law model:

data{
  int n;          // number of observations
  vector[n] x;    // x data
  vector[n] y;    // y data
  vector[2] a_dist;   // mean, stdev for prior of a
  vector[2] k_dist;   // mean, stdev for prior of k
  vector[2] sigma_dist; // mean, stdev for prior of sigma
}

parameters{
  real a;
  real<lower=0> k;
  real<lower=0> sigma;
}

model{
  real mu;
  a ~ normal(a_dist[1], a_dist[2]);
  k ~ normal(k_dist[1], k_dist[2]);
  sigma ~ normal(sigma_dist[1], sigma_dist[2]);
  for (i in 1:n) {
    mu = a * x[i] ^ k;
    y[i] ~ normal(mu, sigma);
  }
}

Power law + linear term model:

data{
  int n;          // number of observations
  vector[n] x;    // x data
  vector[n] y;    // y data
  vector[2] a_dist;   // mean, stdev for prior of a
  vector[2] b_dist;   // mean, stdev for prior of b
  vector[2] k_dist;   // mean, stdev for prior of k
  vector[2] sigma_dist; // mean, stdev for prior of sigma
}

parameters{
  real a;
  real b;
  real<lower=0> k;
  real<lower=0> sigma;
}

model{
  real mu;
  a ~ normal(a_dist[1], a_dist[2]);
  b ~ normal(b_dist[1], b_dist[2]);
  k ~ normal(k_dist[1], k_dist[2]);
  sigma ~ normal(sigma_dist[1], sigma_dist[2]);
  for (i in 1:n) {
    mu = a * x[i] ^ k + b * x[i];
    y[i] ~ normal(mu, sigma);
  }
}

Well, it would be faster to write the model block like

vector[n] mu;
for (i in 1:n) mu[i] = a * x[i] ^ k + b * x[i];
y ~ normal(mu, sigma);
...

but the long / inconsistent runtimes plus the warnings indicate that the fundamental problem is that the posterior is a difficult one to sample from.

Thanks for the feedback. I did change that part of the code, and it definitely helped with speed.

I think I’ve managed to solve the problem - I noticed in the pairs plots that for some of the fits, all the interactions were forming two distinct groups. Then I noticed that the divergent fits all had unhelpful values for one of the parameters (k) but not the others, so I imposed a hard upper limit on the value of k (I’m sure this is bad practice, but … it worked).

After doing this the runtimes and fits were all as expected.

Power law distributions are notoriously weakly-identified and principled priors are a must for robust fitting. Hard cut offs introduce problems in their own right and in general I strongly recommend soft “containment” priors, like a half-normal with sigma = your_current_cutoff / 2.