Hi! If anyone is able to help, I have what I think is a specification issue in fitting a nonlinear regression model in Stan.
As a bit of background to the data and system, I have catch per unit effort (CPUE) data for a group of fishermen (which represents the number of sharks they’re fishing, adjusted for fishing time and number of people/vessels), and a list of biological parameters that describe a shark population that they’re fishing. I also have a discrete function (Beverton-Holt) that starts my shark population at its carrying capacity (the max number of sharks in the population, before fishing started). My function uses the parameters to estimate CPUE each year, and what I want to do is obtain posteriors for two of the parameters (the rest are known) by fitting my function to reported CPUE data.
The data looks like this:
| year | id | CPUE |
| 1983 | 1 | 15 .8 |
| 1990 | 2 | 27.3 |
| 1993 | 4 | 20.9 |
| 1996 | 9 | 12.0 |
| 1997 | 3 | 22.0 |
| 1997 | 2 | 16.5 |
| 1998 | 1 | 14.8 |
Fishermen report multiple CPUE values. For some years, we have multiple CPUE observations, and for others we have none. So, the function I wrote simulates CPUE data based on the parameters, and then since the output of that simulation is one value for each year, it returns the correct years in order of the data. If a year occurs multiple times in the data, it’ll return that same value multiple times.
The problem I’m having is that my model is syntactically fine and compiles, but when I run it, it returns the priors essentially unaltered (I’ve tried it with multiple different priors for both parameters). I think maybe I’m using the function in the wrong place - perhaps it should be used as a link function? Anyway, here is my code:
functions {
real fishing_days(real slope, real t, real dmax, real midpoint) {
return dmax/(1 + exp(-slope * (t - midpoint)));
}
// Model fitting function returns BH predictions for each row in the interview data
vector BH_fit(real q, real slope, real alpha, real K, real dmax, real midpoint,
real lambda, int[] n_boats, int[] years, int[] sim_years) {
real Np[num_elements(sim_years)]; // vectors to store simulated values
real N[num_elements(sim_years)];
real Y[num_elements(sim_years)];
real CPUE[num_elements(sim_years)];
vector[num_elements(years)] CPUE_out; // vectors to match simulated values with observed
vector[num_elements(years)] Yield_out;
N[1] = K; // initialize N at carrying capacity
// run simulation and store yield, CPUE, N for each year
for(i in 1:num_elements(sim_years)) {
Np[i] = lambda*N[i] / (1 + alpha*N[i]);
Y[i] = Np[i] * (1 - exp(-q * fishing_days(slope, sim_years[i], dmax, midpoint) * n_boats[i]));
CPUE[i] = Np[i] * (1 - exp(-q));
if (i < num_elements(sim_years)) {
N[i + 1] = Np[i] * exp(-q * fishing_days(slope, sim_years[i], dmax, midpoint) * n_boats[i]);
}
}
// loop over years of observed CPUE data, store simulated value
for (j in 1:num_elements(years)) {
for (k in 1:num_elements(sim_years)) {
if (years[j] == sim_years[k]) {
CPUE_out[j] = CPUE[k];
Yield_out[j] = Y[k];
}
}
}
return CPUE_out;
}
}
data {
int<lower=0> n_obs; // number of fisherman responses
int<lower=0> n_sim_yrs; // number of years in simulation
int<lower=0> year[n_obs]; // response years
int<lower=0> sim_years[n_sim_yrs];
real<lower=0> CPUE[n_obs];
int<lower=0> n_boats[n_sim_yrs];
real<lower=0> K;
real lambda;
real dmax;
real midpoint;
real alpha;
}
parameters {
real<lower = 0> slope;
real<lower = 0> q;
real<lower = 0> sigma_cpue;
}
model {
vector[n_obs] mu_cpue;
slope ~ normal(108, 40); // vague prior around maximum number of fishing days
q ~ normal(0.014, 0.04); // prior with empirical mean and wide standard deviation
sigma_cpue ~ cauchy(1, 10); // vague half-cauchy prior on the standard deviation
mu_cpue = BH_fit(q, slope, alpha, K, dmax, midpoint, lambda, n_boats, year, sim_years);
CPUE ~ normal(mu_cpue, sigma_cpue);
}
generated quantities {
vector[n_obs] mu_cpue;
mu_cpue = BH_fit(q, slope, alpha, K, dmax, midpoint, lambda, n_boats, year, sim_years);
}
I would greatly appreciate any advice or input. I’m new to Stan, and somewhat to Bayesian stats in general, so I apologize if this is a really basic mistake. Thanks!