1-step-ahead forecasting model fitting

Hey all,

I’ve been trying to fit this model for months, and can’t figure out what’s going wrong, but suspect a combination of non-identifiability and dynamical complexity.

Now, I’m trying to write a 1-step-ahead forecasting model, and keep getting syntax errors that don’t make sense to me? Anywhere, here’s the model…

functions{
  real[] dZ_dt(
    real t,
    real[] Z,
    real[] theta,
    real[] x_r,
    int[] x_i){
    real P = Z[1];
    real H = Z[2];

    real r = theta[1];
    real O = theta[2];
    real h = theta[3];
    real b = theta[4];
    real c = theta[5];
    real u = theta[6];

    real dP_dt = P*r - H*(exp(O)*P/(1 + exp(O)*P*exp(h)));
    real dH_dt = b + H*(c*(exp(O)*P/(1 + exp(O)*P*exp(h)))-u);
    return({dP_dt,dH_dt});
  }
}

data {
  int<lower=0>N; // Define N as non-negative integer
  real ts[N]; // Assigns time points to N
  real<lower=0>y[N,2]; // Define y as real and non-negative
}

parameters {
  real<lower=0>r;
  real<lower=0,upper=1>O;
  real<lower=0>h;
  real<lower=0>b;
  real<lower=0>c;
  real<lower=0,upper=1>u;
  real<lower=0>sigma[2];
}

transformed parameters{
  for (i in 2:N){ // Start at 2nd time step because giving the first
  real Z[i,2]
  = integrate_ode_rk45(dZ_dt, // Function
  y[i-1], // Initial value (empirical data point at previous time step)
  ts[i-1], // Initial time step
  ts[i], // Next time step (time step to be solved/estimated)
  {r, exp(O), exp(h), b, c, u},
  rep_array(0.0,2),rep_array(0,2),1e-10,1e-10,2e4);
  }
}

model {
  r~normal(2.5,1);
  O~normal(log(0.01),2);
  h~normal(log(0.07),2);
  b~normal(35,1);
  c~normal(0.3,1);
  u~normal(0.4,1);
  sigma~lognormal(-1, 1);
  for (k in 1:2) {
    y[ , k] ~ lognormal(log(Z[ , k]), sigma[k]);
  }
}

generated quantities {
  real y_rep[N, 2];
  for (k in 1:2) {
    for (n in 1:N)
      y_rep[n, k] = lognormal_rng(log(Z[n, k]), sigma[k]);
  }
}

O and h are logged because their true values are close to 0, and when multiplied (as they are in the mechanistic model) are VERY close to zero.

And here’s the syntax error I can’t seem to shake:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Fourth argument to integrate_ode_rk45 must have type real[ ];  found type = real. 
 error in 'model2486523de9f_OSA_Lognormal' at line 51, column 49
  -------------------------------------------------
    49:   ts[i], // Next time step (time step to be solved/estimated)
    50:   {r, exp(O), exp(h), b, c, u},
    51:   rep_array(0.0,2),rep_array(0,2),1e-10,1e-10,2e4);
                                                        ^
    52:   }
  -------------------------------------------------

PARSER EXPECTED: ")"
Error in stanc(filename, allow_undefined = TRUE) : 
  failed to parse Stan model 'OSA_Lognormal' due to the above error.

Despite a ton of incredible support from this community, I’ve been stuck on this model for way too long, so any help would be appreciated!

I’m also happy to update this topic with what I was trying before if that would be helpful :)

1 Like

The function has signature real[ , ] integrate_ode_rk45(function ode, real[] initial_state, real initial_time, real[] times, real[] theta, real[] x_r, int[] x_i, real rel_tol, real abs_tol, int max_num_steps)

For the argument real[] times you’re passing in ts[i] which is of type real, which is the meaning of the error message Fourth argument to integrate_ode_rk45 must have type real[ ]; found type = real. The difference is that real[] is an array, and real is just a scalar.

I’ve not used the ODE methods before, but I’d guess that there should just be one call to the ODE solver that looks something like integrate_ode_rk45(dZ_dt, y[1], ts[1], ts, ...) rather than calling it in a loop? The argument times are the points at which the solution is returned.

EDIT: To fix just the error you have, replace ts[i] with ts[i:i], which will make sure it’s the right type.

4 Likes

Okay okay, that makes a lot of sense––I’ve made the change and the error is resolved! Thank you so much!

You’re right about the ODE solver; usually it’s just a single call, not a loop, but I’m trying to get the solver to solve at each time point independently, rather than using memory of previous solutions to inform the current solution.

I’m now (somehow) getting a more confusing error in the model block:

model {
  r~normal(2.5,1);
  O~normal(log(0.01),2);
  h~normal(log(0.07),2);
  b~normal(35,1);
  c~normal(0.3,1);
  u~normal(0.4,1);
  sigma~lognormal(-1, 1);
  for (k in 1:2) {
    y[ , k] ~ lognormal(log(Z[ , k]), sigma[k]);
  }
}

That reads:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Variable "Z" does not exist.
 error in 'model2481bab4a7c_OSA_Lognormal' at line 63, column 29
  -------------------------------------------------
    61:   sigma~lognormal(-1, 1);
    62:   for (k in 1:2) {
    63:     y[ , k] ~ lognormal(log(Z[ , k]), sigma[k]);
                                    ^
    64:   }
  -------------------------------------------------

Error in stanc(filename, allow_undefined = TRUE) : 
  failed to parse Stan model 'OSA_Lognormal' due to the above error.

I figure this must be an issue arising from the loop in the transformed parameters block, but have no idea how to address it:

transformed parameters{
  for (i in 2:N){ // Start at 2nd time step because giving the first
  real Z[i,2]
  = integrate_ode_rk45(dZ_dt, // Function
  y[i-1], // Initial value (empirical data point at previous time step)
  ts[i-1], // Initial time step
  ts[i:i], // Next time step (time step to be solved/estimated)
  {r, exp(O), exp(h), b, c, u},
  rep_array(0.0,2),rep_array(0,2),1e-10,1e-10,2e4);
  }
}

If you have any ideas, I’d be super appreciative to hear them :)

1 Like

Right, because you’re using it to go from known point to an estimate at some future time.

In the transformed parameters block you have written

for(i in 2:N){
    real Z[i, 2] = ...;
}

This is declaring an array of reals of size [i, 2] named Z on every iteration of the loop. The type specifiers real, vector etc. are used to declare and create new variables, and also the Z inside the for loop is in the scope of the for loop – see here https://mc-stan.org/docs/2_23/reference-manual/for-loops.html

What I believe you want is something like this:

transformed parameters{
real Z[N, 2];
for(i in 2:N){
    Z[i, :] = ...;
}

Which first declares an array named Z, and then assigns values to it’s elements.

1 Like

Glad the community has been helpful! Hopefully we can help you finish sorting this out.
ODE models are not my specialty, so I’m not 100% sure about this, but I think (like @RJTK suggested) you probably don’t need the loop because the integrator essentially handles that for you. So maybe all you need in transformed parameters is something like this:

transformed parameters{
real Z[N, 2] = integrate_ode_rk45(
  dZ_dt,  // Function
  y[1],   // Initial state
  ts[1],  // Initial time
  ts,     // Array of solution times
  {r, exp(O), exp(h), b, c, u},
  rep_array(0.0,2),rep_array(0,2),1e-10,1e-10,2e4);
}

Hopefully someone with more ODE experience like @bbbales2 can chime in here because he would know a lot better than I would.

1 Like

I think what OP has done is correct, each initial point y[i] is known, and they want to compare the forecast of y[i + 1] given by the ODE model to the actual value. AFAICT this does actually require a loop?

1 Like

Ah, good point!

@RJTK, This makes a lot of sense to me––thank you!

I tried your solution, and unfortunately, Stan seems unhappy with the “:”…

I changed the TP block to:

transformed parameters{
real Z[N, 2];
for(i in 2:N){
  Z[i, :] = integrate_ode_rk45(dZ_dt, // Function
  y[i-1], // Initial value (empirical data point at previous time step)
  ts[i-1], // Initial time step
  ts[i:i], // Next time step (time step to be solved/estimated)
  {r, exp(O), exp(h), b, c, u},
  rep_array(0.0,2),rep_array(0,2),1e-10,1e-10,2e4);
  }
}

Which throws the error code:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Dimension mismatch in assignment; variable name = Z, type = real[ ]; right-hand side type = real[ , ].
Illegal statement beginning with non-void expression parsed as
  Z[i, :]
Not a legal assignment, sampling, or function statement.  Note that
  * Assignment statements only allow variables (with optional indexes) on the left;
  * Sampling statements allow arbitrary value-denoting expressions on the left.
  * Functions used as statements must be declared to have void returns

 error in 'model1c63cd99a8e_OSA_Lognormal_2' at line 45, column 2
  -------------------------------------------------
    43: real Z[N, 2];
    44: for(i in 2:N){
    45:   Z[i, :] = integrate_ode_rk45(dZ_dt, // Function
         ^
    46:   y[i-1], // Initial value (empirical data point at previous time step)
  -------------------------------------------------

PARSER EXPECTED: "}"
Error in stanc(filename, allow_undefined = TRUE) : 
  failed to parse Stan model 'OSA_Lognormal_2' due to the above error.

@Jonah: Thanks too for your response! Yes, I should have been clearer: I’m trying to write a one-step-ahead forecasting model, wherein the estimation at the current time step is made using only the previous time step. I’ve tried fitting this model traditionally:

transformed parameters {
  real Z[N, 2]
    = integrate_ode_rk45(dZ_dt, y_init, 0, ts, {r, exp(O), exp(h), b, c, u},
                         rep_array(0.0, 2), rep_array(0, 2),
                         1e-10, 1e-10, 2e4);
}

But think that the dynamical complexity of the data prevents the model from capturing the oscillations in the data, which, I think, could be worked around with a one-step-ahead model.

Because of this, I think I need a loop, but I’m open to trying anything! I tried your suggestion as well, and while syntactically sound, I’m not sure it would give me what I’m looking for?

1 Like

The error

Says that the LHS is a 1d array (it’s Z[i, :]) whereas the RHS is a 2D array real[ , ]. The correct statement is possibly Z[i:i, :] = integrate_ode_rk45(…)`? I’ve not used the ODE methods so I’m not certain.

1 Like

The 1 step ahead thing is essentially then a hack to allow some kind of system noise in – a kind of crude approach to a stochastic differential equation, I would say. You could go a step further and introduce a latent noise term, with variance g * delta t (ie some weighting of the time difference from last step), such that
y ~ lognormal(log(Z[ , k]+noise[k]), sigma[k])
This would still be kind of rough but less so, and at least more along the lines of where you’re trying to go, I think…

4 Likes

We discussed the possible non-identifiabilities of the model at Change from global model fit to one-step-ahead prediction

I would still strongly suspect the non-identifiability is the main problem - especially since you mention you work with O and h on the log scale as they are very small and I would expect some of the non-identifiabilities to arise precisely when O is small.

I don’t want to pressure you into following my advice (it might in the end turn out to be wrong advice :-), but my experience is that attempts to work around modelling problems with some empirical tweaks to how the model is coded rarely help and usually only delay the necessity to delve in the math and understand the root of the problem. But obviously, you do you - it definitely seems you are learning a lot about Stan in the process and that’s also something :-)

EDIT: If you also could share a plot of the actual data, that might also help us see if the observed dynamics is rich enough to inform all the parameters…

Looking at the traceplots in Model fitting and sampling issue: Only 1 chain sampling properly makes me suspect even more that non-identifiability is an issue as there seems to be strong negative correlation between O and h

Best of luck!

1 Like

Yes, this works! Awesome––thank you so much! I’ll update this post after I run everything again to say how it went! Thank you so much for your help and patience! :)

1 Like

This is a neat idea; I’ll give it a try! Thanks!! :)

1 Like

Yes––I still very much agree that this is an identifiability issue. I tried combining O and h, as well as simplifying the mechanistic model, without much luck. I think part of the problem is that when multiplied, the true value of Oh is very small. I tried logging the prior to get around this, but to no avail, so I’m kind of at a loss.

Here’s what I tried when logging the priors for O and h (I also tried combining O and h, and logging that single prior, but I won’t show that here for brevity):

functions{
  real[] dZ_dt(
    real t, 
    real[] Z, 
    real[] theta, 
    real[] x_r, 
    int[] x_i){
    real P = Z[1]; 
    real H = Z[2];

    real r = theta[1]; 
    real O = theta[2];
    real h = theta[3];
    real b = theta[4];
    real c = theta[5];
    real u = theta[6];

    real dP_dt = P*r - H*(exp(O)*P/(1 + exp(O)*exp(h)*P));
    real dH_dt = b + H*(c*(exp(O)*P/(1 + exp(O)*exp(h)*P))-u);
    return({dP_dt,dH_dt});
  }
}
data {
  int<lower=0> N;           
  real ts[N];                 
  real y_init[2]; 
  //real y0[2];
  real<lower=0> y[N, 2];    
}
parameters {
  real<lower=0>r;
  real O;
  real h;
  real<lower=0>b;
  real<lower=0>c;
  real<lower=0,upper=1>u;
  real<lower=0>Z_init[2];  
  real<lower=0>sigma[2];   
}
transformed parameters {
  real Z[N, 2]
    = integrate_ode_rk45(dZ_dt, y_init, 0, ts, {r, exp(O), exp(h), b, c, u},
                         rep_array(0.0, 0), rep_array(0, 0),
                         1e-10, 1e-10, 5e2);
}
model {
  r~normal(2.5,1);
  O~normal(log(0.01),2); 
  h~normal(log(0.07),2); 
  b~normal(35,1); 
  c~normal(0.3,1); 
  u~normal(0.4,1);
  sigma~lognormal(-1, 1);
  Z_init~lognormal(log(3), 1);
  for (k in 1:2) {
    y_init[k] ~ lognormal(log(Z_init[k]), sigma[k]);
    y[ , k] ~ lognormal(log(Z[ , k]), sigma[k]);
  }
}
generated quantities {
  real y_init_rep[2];
  real y_rep[N, 2];
  for (k in 1:2) {
    y_init_rep[k] = lognormal_rng(log(Z_init[k]), sigma[k]);
    for (n in 1:N)
      y_rep[n, k] = lognormal_rng(log(Z[n, k]), sigma[k]);
  }
}
1 Like