Usage of Algebraic Solver in User Specified Function?

Hello,

I have a user specified function to be defined under the “functions” block which requires a root solving routine embedded within the function, the closest I came to within Stan’s existing functions is the algebra_solver(.) function.

While it seems clearly possible to use existing Stan functions in user specified functions (e.g. cos(.), append_col(.)), it is unclear to me whether this is also the case for the algebra_solver(.) and any similar functions.

This is because the documentation for algebra_solver(.) states very specific declarations for the input variables, i.e. on page 170 of the Stan User Guide: “The arguments for the real data x_r and the integer data x_i must be expressions that only involve data or transformed data variables. theta, on the other hand, must only involve parameters. Note that there are no restrictions on the initial guess, y_guess, which may be a data or a parameter vector.

Moreover, since the block structure of Stan’s model body is not retroactive to previous blocks, i.e. specifications in prior blocks are inherited by subsequent blocks, but not vice versa, it implies that the algebra_solver(.), as a consequence of its inputs required to be specified in blocks following the functions{.} block, is not usable in the user specified function block.

EDIT: actually, after some thought, the block order is only an issue with regard to the actual call to the user specified function instead of the actual specification of the function, no? So if I embed algebra_solver(.) in my function (call this function A), then make the call to function A later in the transformed_parameter block, it should work right?

Is this correct? I am trying to decide whether I should write a root solving routine; otherwise, if there is a more appropriate existing function available, that would be great too!
Thank you for your time,
Chiot


1 Like

I think you should be able to work around this by using the ‘data’ qualifier on arguments to the function you’re wrapping the algebra solve up in. Search for “Data-only Qualifiers” here: https://mc-stan.org/docs/2_21/reference-manual/argument-types-and-qualifiers.html

Maybe start by just writing the simplest Stan model you can think of just to make sure this is all working. Let me know if you have problems though. It’s been an issue before that things have been unexpectedly getting promoted to vars, so if you’re getting weird compiler errors report back.

2 Likes

bbbales2,

Thank you for the suggestion. If I understand the documentation correctly, it means that the function_A requires (in its arguments) 3 additional arguments to accommodate: y_guess, x_r, and x_i.

e.g. (T indicates some arbitrary ‘type’):

vector function_A (T A, T B, T C, data vector y_guess, data vector[] y_guess, data real[] x_r, data int x_i) { ... }

If ‘yes’, my question then, is whether the same must be done for the parameter and y vector?
From the documentation:

theta , on the other hand, must only involve parameters. Note there are no restrictions on the initial guess, y_guess , which may be a data or a parameter vector.

Does this mean that theta must be strictly defined under the “parameter” or “transformed parameters” block and passed into the outer function? Or does a locally declared variable in function_A also qualify as a parameter?

To provide a bit more context to the dilemma,

  1. The root-finding routine embedded in function_A is intended to be a black-box, i.e. the solution from the root-finding is applied locally in function_A without being reported to the system,
  2. function_A itself is also embedded (as the gradient function) in a user-defined RK4 ODE-solver (for technical reasons), call this function_RK4, and
  3. The parameters “theta” sent to function_A for the root-finding routine is updated for every time point in function_RK4.

So if “theta” also needs to be strictly defined in the modified-parameter block (say), it seems that I would need to also specify a function that “reports” the updated values of “theta” back to the system in the modified-parameter block? Something like that?

It seems to me that there is utility in writing a simple solver e.g. Newton-Raphson or Secant if that is all the case.

Thank you for your time,
Chiot

These are all good questions haha.

It sounds like you want to include an algebra solve on the right hand side of an ODE in Stan. This should work, but it will probably be pretty slow so just beware.

I hacked together a model that had an algebra solve in the ODE right hand side and it compiled at least.

If you have the gradients to write your own simple solver and are confident it’ll converge in a few iterations or whatever, then this could work fine.

There’s considerable stuff happening behind the scenes to make the autodiff work so what is getting reported where is a bit tricky :D. The goal of the Stan model is to calculate the log density of the model at a certain set of parameters and the gradients of that with respect to the parameters in the parameters block. For details on how that connects together, check: [1509.07164] The Stan Math Library: Reverse-Mode Automatic Differentiation in C++

The question with these things is making sure autodiff works through them.

bbbales2,

Thank you for the reply.

I lack some background on this topic, but if I understood the gist of AD (in context of the log density gradient) correctly, “more iterations in the root solving routine = bad” due to the increased number of edges (in the computational graph, for the partial derivatives) required to get the derivatives of the log density wrt the parameters. Something like that?

I suppose what I am unclear of is (especially at the mention of "making sure autodiff works through (the functions)): what constitutes a badly defined user-defined function with regards to AD? It seems to me that void functions are only allowed for side-effects for this particular reason (autodiff), but I am wondering if there is any guidance on ensuring well-behaved non-void user-defined functions.

[EDIT] Some context, the secant method seems to converge consistently between 5~7 iterations when ran on MATLAB for a small sample of simulated parameter values for the equation of interest (equation is approximately linear with an implicit component). I shall try to run some tests in Stan in the morning.

Thank you for your help,
Chiot

It all gets weird cause some pieces aren’t actually autodiffed (like the root solvers), but they might use autodiff inside them to do the thing I just said isn’t autodiff blah blah gibberish lol. It’s confusing.

The thing to keep in mind is that in a pretty easy to sample problem we might need to evaluate the log likelihood around 50 times per draw. If we’re going to collect 4000 posterior draws, that’s 200000 model evaluations. If we want our inference done in 20 seconds cause we’re impatient (at least I am), then that means we’ve got 100 microseconds to evaluate our log likelihood.

So even relatively simple numeric things like small ODE solves become difficult in this context. Tack on an iterative solve and it’s a little harder still.

The easiest way to write code that breaks autodiff are if statements.

In R:

function(x) {
   if(a > 5) {
     return a;
   } else {
     return 2 * a;
   }
}

That’ll create a discontinuous function which violates the assumptions that go into Stan’s samplers.

In R:

function(x) {
   if(a < 1) {
     return a;
   } else {
     return a^2;
   }
}

This makes a function that’s non-differentiable at a == 1. This also violates the sampler assumptions. Sometimes you’ll see models where people get away with this.

Good thing to check!

Where you get if statements in solvers like these in solvers are on the tolerances. So you want to quit iterating after you’ve found a solution. If you put that if statement in there, you can make discontinuities as the parameter moves around as the problem becomes easier/harder at different parameters.

If you don’t have if statements and just run the iterative method a fixed number of times, assuming each iteration is autodiffable, then the overall thing will be autodiffable and you’ll be good. The issue here is you’re not sure if you solved to full precision and you’re also wasting computational effort.

In practice I’d give it a go if the algebra solver doesn’t work. You can always run a high precision solve in generated quantities and compare it to the things you were calculating with a fixed number of iterations and get an idea of how much error there was in your calculation. That make sense?

But anyway just code up your problem and give it a go and see how slow it is. If everything works by default that’s great. Start simple and build. Use generated data so you know you’re fitting the right thing.

1 Like

bbbales2,

I see, that makes sense to me, thank you.

I coded a toy example to test the viability of the secant solver; the program did compile and run, but came back with several warnings-- Which doesn’t seem to me to be of too much concern, but as a new user to Stan (and MCMC in general), I am unsure if they indicate a more deeply rooted issue or just bad code on my part.

Some context: I intend to use the output B as the input to something else, i.e. I am not actually interested in how B is distributed, but I wanted to see if this small segment of root-finding code in Stan is well-behaved before nesting it in another function. However, I am unsure whether this toy example is sufficient to conclude that autodiff is working as intended.

[EDIT] I just read the documentation on the first warning regarding the divergent transition, I’ll try raising the adept_delta next time, but I am not sure if it’s something I’d need to worry about moving forward.

[EDIT 2] Thinking about the problem, it makes sense that many samples were divergent (if I understand the sampling process correctly), especially since the solutions to B is exact (numerical errors aside of course) because B isn’t sourced from data i.e. there isn’t any random noise to contend with. So in that sense, the values of lo_B and up_B will be well defined, and the sampler will “step off the edge” more often than not because there isn’t any scatter in the “data” for B, i.e. lo_B and up_B’s distributions will be a sharp peak centered at their respective values. Does that sound right?

Thank you for your time,
Chiot

Code in R:

library("rstan")

set.seed(1234)

N = 50
Ci <- runif(N, min = 12000, max = 12920)
C  <- runif(N, min = 1.4, max = 1.6)
E  <- runif(N, min = 0.7, max = 0.9)

init_val <- function () list(lo_B = 0.05, hi_B = 0.2)

# Example scenario, want to generate values of B
fit_B <- stan(
  file = "Z01_algebra_solver_example.stan",  # Stan program
  data=c("N","Ci","C","E"),            #algorithm="Fixed_param" when using monte carlo simulation,
  chains = 4,             # number of Markov chains
  warmup = 1000,          # number of warmup iterations per chain
  iter = 5000,            # total number of iterations per chain
  cores = 2,              # number of cores (could use one per chain)
  refresh = 0,             # no progress shown
  init = init_val
)

options(max.print=100000)
print(fit_B, pars = c("B"), include = FALSE)

The code in Stan:

functions {
  
  real B_root(real Ci, real B, real C, real E, real K, real f, real Fz){
    
    real root_eqx;
    
    root_eqx = Ci - (B*C*K*f*Fz)/(sin(C*atan(B*((1-E)*K + (E/B)*atan(B*K)))));
    
    return root_eqx;
    
  }
  
  // Secant method root solver
  real secant_root(real Ci, real C, real E, real K, real f, real Fz){
    
    real x0 = 0.01;
    real x1 = 0.5;
    real x2;
    
    real f0;
    real f1;
    
    for (i in 1:10){
    
      f0 = B_root(Ci, x0, C, E, K, f, Fz);
      f1 = B_root(Ci, x1, C, E, K, f, Fz);
    
      x2 = x1 - ((x0 - x1)*f1)/(f0 - f1);
    
      if ( fabs(x2 - x1) < 1E-10 ){
        break;
      }
      
      x0 = x1;
      x1 = x2;
    
    }
    
    return x2;
    
  }
  
  real shell_func(real Ci, real C, real E, real K, real f, real Fz){
    real B_out = secant_root(Ci, C, E, K, f, Fz);
    
    return B_out;
    
  }
  
}

data {
  int<lower = 0> N;
  vector[N] Ci;
  vector[N] C;
  vector[N] E;
}

parameters {
  real lo_B;
  real up_B;
}

transformed parameters {
  real K  = 100;
  real Fz = 677.9820;
  real<lower=0> f = 0.9;
  
  vector[N] B;
  
  for (i in 1:N){
    B[i] = shell_func(Ci[i], C[i], E[i], K, f, Fz);
  }
}

model {
  lo_B ~ gamma(0.001,0.001);
  up_B ~ gamma(0.001,0.001);
  B ~ uniform(lo_B, up_B);
}

generated quantities {
  //real Ci = uniform_rng(12000,12920);
  //real C  = uniform_rng(1.4,1.6);
  //real E  = uniform_rng(0.7,0.9);
  
  //real B = shell_func(Ci, C, E, K, f, Fz);
  
}

The output and warnings in R:

DIAGNOSTIC(S) FROM PARSER:
Info:
Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.
If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.
Left-hand-side of sampling statement:
    B ~ uniform(...)

Warning messages:
1: There were 13984 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 
2: Examine the pairs() plot to diagnose sampling problems
 
> 
> options(max.print=100000)
> print(fit_B, pars = c("B"), include = FALSE)
Inference for Stan model: Z01_algebra_solver_example.
4 chains, each with iter=5000; warmup=1000; thin=1; 
post-warmup draws per chain=4000, total post-warmup draws=16000.

       mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
lo_B   0.11    0.00 0.00   0.11   0.11   0.11   0.11   0.11   648 1.00
up_B   0.14    0.00 0.00   0.14   0.14   0.14   0.14   0.14   699 1.01
K    100.00     NaN 0.00 100.00 100.00 100.00 100.00 100.00   NaN  NaN
Fz   677.98    0.00 0.00 677.98 677.98 677.98 677.98 677.98     2 1.00
f      0.90    0.00 0.00   0.90   0.90   0.90   0.90   0.90     2 1.00
lp__ 177.26    0.06 1.48 173.60 176.53 177.59 178.35 179.14   577 1.01

Samples were drawn using NUTS(diag_e) at Sun Oct 27 16:31:41 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
2 Likes

You could have problems if lo_B or up_B passed over values of B. If all the elements of B are not within the range [lo_B, up_B], the uniform lpdf will be undefined and you’ll get divergent transitions (stepsize won’t help with these divergences).

Normals are probably the easiest distributions to just play with. Just make sure the standard deviation parameter is greater than zero and have at it.

1 Like