Hi there, I’m not being able to use Stan to fit a model where I have an integral of a function which has to be solved numerically, using Stan’s algebraic equation solver, that depends on the data, parameters and derived parameters.

I’ve thought about transforming the problem into a simpler form to show it here, but I’m afraid I miss some possibly relevant technical details, so I’ll show here the problem in it’s full form, and hopefully I’m able to make things clear.

# The Model

Starting from the beginning my model has 3 parameters

\theta = \{h, \Omega_m,\Omega_r\},

and one derived parameter

\lambda = \frac{1}{2} + W_0 \left( \frac{\Omega_m + \Omega_r}{2e^{1/2}} \right).

My model tries to replicate something called the gravitational wave luminosity distance, given by d_{GW}(z), where z is the independent variable, which we call redshift.

The model states that

d_{GW}(z) = \sqrt{\frac{1-\lambda}{1-\lambda/E^2(z)}} e^{\frac{\lambda}{2} \frac{E^2(z) - 1}{E^2(z)}} \frac{3(1+z)}{h} \int_0^z \frac{1}{E(x)} dx,

where E(z) is a function which has to be solved numerically from the following equation

( E^2(z) - 2 \lambda) e^{\lambda/E^2(z)} = \Omega_m (1 + z)^3 + \Omega_r (1 + z)^4.

Observations then provide me with the redshift z for each event, it’s gravitational wave luminosity distance d_{GW}(z) and an error associated with that measurement \sigma(z).

# Implementation in Stan

The data block is quite simple, and is given by

```
data {
int N1;
array[N1] real redshift;
array[N1] real luminosity_distance;
array[N1] real error;
}
```

The parameter block is quite simple as well, as I simply have to declare each of the three parameters, with the corresponding constrains

```
parameters {
real<lower=0> h;
real<lower=0,upper=1> Omega_m;
real<lower=0,upper=1> Omega_r;
}
```

Assuming a Gaussian likelihood and wide Gaussian priors we have the following model block, where dGW is d_{GW}(z) given by the equation shown before

```
model {
h ~ normal(0.7, 10);
Omega_m ~ normal(0.284, 10);
luminosity_distance ~ normal(dGW, error);
}
```

My problems now start in the transformed parameters block. My sequence is as follows:

- Compute the value of \lambda directly from \theta;
- Compute the value of E(z) at each redshift given by the dataset (in the code
`redshift[i]`

); - Integrate 1/E(z) from 0 to
`redshift[i]`

; - Compute the final result for d_{GW}(z).

The following code is meant to represent that, however, it’s not fully functional

```
transformed parameters {
array[N1] real dGW; // luminosity distance for gravitational waves
real correction; // correction from modification to gravity
real dL; // background luminosity distance
real E; // value of E(z) evaluated at z = redshift[i]
real lambda; // quantity derived from Omega_m and Omega_r
for (i in 1:N1) {
lambda = 0.5 + lambert_w0( (Omega_m + Omega_r)/(2*exp(1/2)) );
E = Efunc(redshift[i], theta, lambda, x_r, x_i);
correction = ((1-lambda)/(1-lambda/E^2))^0.5 * exp(lambda/2 * (E^2 - 1)/E^2);
dL = (1.0+redshift[i]) * (3/h) * integrate_1d(integrand, 0, redshift[i], {h, Omega_m, Omega_r}, x_r, x_i);
dGW[i] = correction * dL
}
}
```

# The problems

**1 -** The algebraic solver - 10.1 Algebraic equation solver - takes vectors as inputs and outputs whereas the integrator - 10.4 1D integrator - uses arrays.

Obviously, I can convert one to the other, it doesn’t look good, and is quite confusing, but it works.

**2 -** I cannot communicate the variables `redshift[i]`

and `lambda`

to the algebraic system using the algebraic solver.

If I define, as specified in 10.1 Algebraic equation solver, the algebraic system which I would like to solve as `Esystem`

, given by

```
// auxiliary function to compute E(z) with algebraic solver
vector Esystem(vector y, vector theta, data array[] real x_r, array[] int x_i) {
real E = y[1];
real Omega_m = theta[2];
real Omega_r = theta[3];
vector[1] output;
// z = (?) - how do i communicate z?!
lambda = 0.5 + lambert_w0( (Omega_m + Omega_r)/(2*exp(1/2)) ); // how do i communicate lambda to avoid computing this every time?!
output[1] = (E^2 - 2*lambda)*exp(lambda/E^2) - Omega_m*(1+z)^3 - Omega_r*(1+z)^4
return output
}
```

Which I would then call with another function

```
real Efunc(z, theta, lambda, x_r, x_i) {
vector[1] y_guess;
y_guess = 1;
// annoying... algebra solver requires vector whereas integrator requires array and i declared it as an array
vector[3] thetavec;
thetavec[1] = theta[1];
thetavec[2] = theta[2];
thetavec[3] = theta[3];
// how do i communicate z and lambda?!
return algebra_solver_newton(Esystem, y_guess, thetavec, x_r, x_i)[1];
}
```

You can see that there is no way for me to communicate both `redshift[i]`

, which represents a given piece of data, nor `lambda`

, a derived parameter.

**3 -** I cannot communicate `lambda`

to the function `integrand`

.

The function `integrand`

is defined as follows:

```
real integrand(real x, real xc, real[] theta, real[] x_r, int[] x_i) {
// (?) how do i get lambda over here?
// (?) do i have to define this function? couldn't i just call the integrator in the function below?
return (Efunc(x, theta, lambda, x_r, x_i))^(-1);
}
```

Although I could recompute its value, much like in the problem before, this would be extremely inefficient as it only has to be computed on every leapfrog step, not every function call.

**4 -** Is it possible to avoid defining the previous function and calling `(Efunc(x, theta, lambda, x_r, x_i))^(-1)`

directly in the `integrate_1d`

function in the transformed parameters?

**5 -** Regarding good practices, when should I be using a vector and when should I be using an array? As stated in problem 1, sometimes vectors are used, sometimes arrays are used. I know I can make an array of vectors, but specifically, for \theta, should I be using a vector or an array? And for my dataset?

**6 -** How it be smarter to define y(z) = E^2(z) and solve the algebraic system for y(z) instead of E(z) and then in the end compute its square root? Because I imagine that the algorithm will compute the square root at each step to see if it matches zero, something which is very expensive and avoidable in this case.

Any other tip, regarding optimization, good practices, etc. is more than welcome!

Thank you very much for your time, this post turned longer than I expected, and if you have any question, feel free to ask!