Stan ignoring initial values when amount of data increases

Hi there,
I’m a new stan user who is also new to mixed effects models, for which I’m posting a question about. I have two level hierarchical model with repeated measurement for about ~25000 observations that I wrote in R. It is a nonlinear mixed effects models performing logistic regression, with beta distributed residuals. The stan model I’m posting is a simplified toy model of the one I’m using with additional toy data. I’m running into an issue in which my initial estimates for my parameters are used, but then ignored. This happens when I change the amount of data I include. In particular, when I include 8412 rows of data, my initial estimates are used, but when I include 8413 rows of data and more, stan rejects my initial values, and uses the default random initialization between (-2,2). This latter case results in the log probability having issues initializing, i.e. log probability evaluates to log(0) or other such errors. I’m posting the R script, stan file, and uploading the data. It converges fine, which I’m not concerned about, hence the reason for 1 iteration and warm up. Please let me know if someone can identify the issue. I’m fully aware it may be a very trivial issue, but I have not been able to reconcile it for quite some time. Thanks.

Stan Model:
int N; // number of total Observations
int P; // number of clusters
int NUM[N]; // cluster ID for each observation
vector[N] IND; // time of observation
real DEP[N]; // Dependent variable

//Fixed effects
real<lower=0,upper=1> theta_S0; // Mean baseline dependent variable for average individual
real theta_r; // Mean rate of change for average individual in logistic regression
real<lower=0,upper=100> tau; // Beta shape parameter

//Random effects (re)
vector[P] eta_baseline;                 // re for baseline
vector[P] eta_rate;            		// re for rate
real<lower=0> omega_baseline;           // std baseline
real<lower=0> omega_rate;              	// std rate 


transformed parameters{
vector[N] baseline_cov;
vector[N] rate_cov;
vector[N] DEP0;
vector[N] r;
vector[N] ones;
vector<lower=0,upper=1>[N] muS; //Mean dependent variable

for(i in 1:N) ones[i] = 1;

  	baseline_cov = theta_S0*ones;		
  	rate_cov = theta_r*ones;
  	DEP0 = baseline_cov .* exp(eta_baseline[NUM]);	      //Random effect on baseline
  	r = rate_cov + eta_rate[NUM];			            //Random effect on rate
  	muS = DEP0 ./ (DEP0+(1-DEP0).*exp(-r .* IND));	       //Logistic function as mean behavior
  	muS = (muS*(P-1)+.5)/P;				             //Transformation to ensure muS is in the open interval (0,1)



	theta_S0~uniform(0,1); //orig lognormal(0,1?)
	tau~uniform(0,100); //orig uniform(0,100)

// Likelihood
	DEP ~beta(muS*tau, (1-muS)*tau);


R Script:

data <- read.csv(‘Model_data.csv’)
temp_data <- data[1:8412,] #Uses my initial values for 8412, but ignores them for 8413

N <- nrow(temp_data)
P <- length(unique(temp_data[,1]))


dat_list <- list(N,P,NUM = NUM, IND = IND, DEP = DEP)

create initial estimates

init = function() list(theta_S0 = runif(1,0,1),
theta_r = rnorm(1,.147,.1),
tau = runif(1,75,100),
omega_baseline = rlnorm(1,log(.405)-.5,1),
omega_rate = rlnorm(1,log(.206)-.5),
eta_baseline = rep(0,P),
eta_rate = rep(0,P))

parameters <- c(“theta_S0”,“theta_r”,“omega_baseline”,“omega_rate”,“tau”)

nChains <- 1
nPost <- 1
nBurn <- 1
nThin <- 1

nIter <- (nPost + nBurn) * nThin
nBurnin <- nBurn * nThin

fit <- stan(model_name =“Model”, paste(model_name, “.stan”, sep = “”),
data = dat_list,
pars = parameters,
iter = nIter,
warmup = nBurnin,
thin = nThin,
init = init,
chains = nChains,
verbose = TRUE,
control=list(adapt_delta=0.9999, stepsize = 0.01, max_treedepth =20)

Model_data.csv (732.5 KB)

Could you post why you think Stan is ignoring your initial values? What are the messages? More likely with a larger amount of data your initial estimates are outside of the typical set and you get the usual exploratory behavior you would expect from widely dispersed initial values.

Sure, I’ll put that on.

You don’t want to ever be running Stan with warmup = 1. It needs to find the typical set and adapt. Back off from (-2, 2) to (-1, 1) or smaller and make sure you’re not using those interval priors (which initialize in the middle of the interval).

In response to your question, I get the following error only when I change the number of rows in my data file to include 8413 or higher: “Rejecting initial value: Log probability evaluates to log(0), i.e. negative infinity. Stan can’t start sampling from this initial value.” This message repeats 100 times and then says “Initialization between (-2, 2) failed after 100 attempts. Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.” Also, when I perform the command “get_inits(fit)”, it returns “list()”. Now, when I change the number of rows to include in my data to be less than or equal to 8412, the code initializes just fine, it performs the simulation, and the command “get_inits(fit)” returns:
[1] 0.59

[1] -0.043

[1] 76

[1] 0 0 0 0 0 0 … (many repeated values)
[1] 0 0 0 0 0 0 … (many repeated values)

[1] 1.4

[1] 0.15

All this shows that stan used my initial values, but in the former case, it ignored them, and used the default (-2,2) range. To address what you said additionally Sakrejda, I have even tried removing the random sampling in my initial value choices, and just made them fixed. However, the same issue arose.

In response to your statements Bob, I used warmup =1 to simply speed along the simulation, not caring about the output, but rather just to see how the initialization was handled. And yes, I do know to reduce the value of init_r to be much smaller than 2, but it seems in the case above, stan defaulted to using that range outside of my control.

Let me know what you guys think. Thanks!

Ok, it’s not ignoring your initial values. It tries to start there, fails, and then tries to find a random starting point. This will occur, e.g., if data point 8413 has a probability of occurrence of zero (or numerically indistinguishable from zero). The next step is to print out some of your intermediate values (maybe particularly for that data point, maybe for more data points, and find out where the calculation goes wrong. For complex models it’s possible to just get to a numerical zero if you don’t have great starting values.

Ok that makes sense that stan is still attempting my initial values but rejecting them. A couple questions for my own conceptual understanding. 1) Is the log being taken of my likelihood function for the evaluation of the initial values? 2) If so, is the beta function challenging to be dealt with numerically? 3) Do print statements work with vectorized quantities? I ask that last question because I attempted to put in the line “print(muS[8413])” and it didn’t work. How and where would you recommend putting in print statements for troubleshooting?

@sakrejda Is that really the behavior of RStan? I’d rather have it just reject if it can’t use the initial values. Having it back off to random seems like a recipe for confusing the user.

We haven’t found many cases where we need to initialize Stan’s HMC samplers.

1 Like

Stan programs define log posterior densities, so everything’s on the log scale. The beta function isn’t that bad numerically—it uses lgamma, the gamma function on the log scale.

I’m afraid troubleshooting by print was broken in the recent command refactor of the commands and won’t reappear until 2.16.

The random inits are on the unconstrained scale—there’s a chapter in the manual that explains how they get transformed. For example, a positive constrained variable is initialized between exp(-2) and exp(2) because exp() is used to map an unconstrained value to a positive value.

IDK for sure but that’s what his messages say and it’s typical r behavior
to try to do too much fancy stuff.

Well thanks for the input. This is going to be challenging I’m sure. This is only a base model, as the main one will have several more fixed effects, and needs to make sure of much more data. If you guys have any further input regarding how best to identify initial values, please let me know.

If you want to troubleshoot by print you could still edit the .hpp file generated by CmdStan, it’s relatively straightforward… :/

I usually write my models so that parameters are all expected to be about N(0,1), although that’s sometimes hard to do. That tau initialized at 76 has me concerned—I’m not sure how well that even does numerically.

We’re about to release 2.16, which will fix the bug where the error messages during init were inadvertently thrown away rather than displayed.

In general, you shouldn’t need custom inits. What you do need is to make sure that every value of the parameters that satisfies the declared constraints has a finite log density.

I’m pretty sure we’re also going to change RStan so that it no longer falls back to randomly guessing if the inits supplied by the user fail. My own feeling, though it’s not very R-like, is that if the user specifies something the wrong way, the program should stop and let them know. The R way seems to be to fall back to guessing what they might want to help them along. I find it gets really confusing for users, as exemplified here.

Regarding this post, the issue turned out to be data that took on values not defined for the log posterior. A simple transformation addressed these issues motivated by everyone’s suggestions. Thanks.