Hi there,
I’m a new stan user who is also new to mixed effects models, for which I’m posting a question about. I have two level hierarchical model with repeated measurement for about ~25000 observations that I wrote in R. It is a nonlinear mixed effects models performing logistic regression, with beta distributed residuals. The stan model I’m posting is a simplified toy model of the one I’m using with additional toy data. I’m running into an issue in which my initial estimates for my parameters are used, but then ignored. This happens when I change the amount of data I include. In particular, when I include 8412 rows of data, my initial estimates are used, but when I include 8413 rows of data and more, stan rejects my initial values, and uses the default random initialization between (-2,2). This latter case results in the log probability having issues initializing, i.e. log probability evaluates to log(0) or other such errors. I’m posting the R script, stan file, and uploading the data. It converges fine, which I’m not concerned about, hence the reason for 1 iteration and warm up. Please let me know if someone can identify the issue. I’m fully aware it may be a very trivial issue, but I have not been able to reconcile it for quite some time. Thanks.
Stan Model:
data{
int N; // number of total Observations
int P; // number of clusters
int NUM[N]; // cluster ID for each observation
vector[N] IND; // time of observation
real DEP[N]; // Dependent variable
}
parameters{
//Fixed effects
real<lower=0,upper=1> theta_S0; // Mean baseline dependent variable for average individual
real theta_r; // Mean rate of change for average individual in logistic regression
real<lower=0,upper=100> tau; // Beta shape parameter
//Random effects (re)
vector[P] eta_baseline; // re for baseline
vector[P] eta_rate; // re for rate
real<lower=0> omega_baseline; // std baseline
real<lower=0> omega_rate; // std rate
}
transformed parameters{
vector[N] baseline_cov;
vector[N] rate_cov;
vector[N] DEP0;
vector[N] r;
vector[N] ones;
vector<lower=0,upper=1>[N] muS; //Mean dependent variable
for(i in 1:N) ones[i] = 1;
baseline_cov = theta_S0*ones;
rate_cov = theta_r*ones;
DEP0 = baseline_cov .* exp(eta_baseline[NUM]); //Random effect on baseline
r = rate_cov + eta_rate[NUM]; //Random effect on rate
muS = DEP0 ./ (DEP0+(1-DEP0).*exp(-r .* IND)); //Logistic function as mean behavior
muS = (muS*(P-1)+.5)/P; //Transformation to ensure muS is in the open interval (0,1)
}
model{
//Priors
omega_baseline~normal(0,1);
omega_rate~normal(0,1);
eta_baseline~normal(0,omega_baseline);
eta_rate~normal(0,omega_rate);
theta_S0~uniform(0,1); //orig lognormal(0,1?)
theta_r~normal(0,1);
tau~uniform(0,100); //orig uniform(0,100)
// Likelihood
DEP ~beta(muS*tau, (1-muS)*tau);
}
R Script:
data <- read.csv(‘Model_data.csv’)
temp_data <- data[1:8412,] #Uses my initial values for 8412, but ignores them for 8413
N <- nrow(temp_data)
P <- length(unique(temp_data[,1]))
NUM<-temp_data[,1]
IND<-temp_data[,2]
DEP<-temp_data[,3]
dat_list <- list(N,P,NUM = NUM, IND = IND, DEP = DEP)
create initial estimates
init = function() list(theta_S0 = runif(1,0,1),
theta_r = rnorm(1,.147,.1),
tau = runif(1,75,100),
omega_baseline = rlnorm(1,log(.405)-.5,1),
omega_rate = rlnorm(1,log(.206)-.5),
eta_baseline = rep(0,P),
eta_rate = rep(0,P))
parameters <- c(“theta_S0”,“theta_r”,“omega_baseline”,“omega_rate”,“tau”)
nChains <- 1
nPost <- 1
nBurn <- 1
nThin <- 1
nIter <- (nPost + nBurn) * nThin
nBurnin <- nBurn * nThin
fit <- stan(model_name =“Model”, paste(model_name, “.stan”, sep = “”),
data = dat_list,
pars = parameters,
iter = nIter,
warmup = nBurnin,
thin = nThin,
init = init,
chains = nChains,
verbose = TRUE,
control=list(adapt_delta=0.9999, stepsize = 0.01, max_treedepth =20)
)
Model_data.csv (732.5 KB)