Does Stan take log in the background

Lilia_Feng · August 21, 2018, 5:13am

Hi, I am working on an survival analysis model, and used an example online (link posted below) . The professor who posted it said when he wrote this code in 2016 with older version of Stan, it worked without any problems. but with the newest version of Stan, I got the “initial value rejected” error. I wrote out Jags code, and it can sample successfully. I have been working on this example for weeks. Feel like a burnout. I have came to ask here, and got some suggestions. tried several ways to make it start sampling.

In this mode, Y is a matrix that contains only 0 or 1. dL0 follows gamma distribution. exp() won’t be a negative number. Therefore, Idt should be greater or equal to 0.

 Idt[i, j] = Y[i, j] * exp(beta[1]*pscenter[i] + beta[2]*hhcenter[i] + beta[3]*ncomact[i]  + beta[4]*rleader[i] + beta[5]*dleader[i] + beta[6]*inter1[i] + beta[7]*inter2[i]) * dL0[j];

However, if the absolute value is set around Idt, it starts sampling. This is weird to me. Apparently Stan did something in the background, but not sure what it is.

 dN[i, j] ~ poisson(fabs(Idt[i, j]));

Or, if I exclude the 0s, it can also start sampling.

if(Y[i,j]==1){
dN[i, j] ~ poisson(Idt[i, j]); }

I am really confused here. Could anyone help me explain what Stan did to make it difficult to sample? Any suggestions are appreciated, thanks!

library(ggplot2)
library(plyr)
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

cox_model <- 
"data {
    int<lower=0> Nsubj; 
    int<lower=1> T;
    vector[Nsubj] pscenter;
    vector[Nsubj] hhcenter;
    int<lower=0,upper=1> ncomact[Nsubj];
    int<lower=0,upper=1> rleader[Nsubj];
    int<lower=0,upper=1> dleader[Nsubj];
    vector[Nsubj] inter1;
    vector[Nsubj] inter2;
    int<lower=0,upper=1> FAIL[Nsubj];
    int<lower=0> obs_t[Nsubj];
    int<lower=0> t[T+1];
    
} 
transformed data {
    int<lower=0> Y[Nsubj, T];
    int<lower=0> dN[Nsubj, T];
    
   
    for(i in 1:Nsubj) {
        for(j in 1:T) {
            Y[i,j] =int_step(obs_t[i] - t[j]);
            dN[i,j] =Y[i, j] * int_step(t[j + 1] - obs_t[i]) * FAIL[i];
        }
    }
}
parameters {
    vector[7] beta;
    real<lower=0> c;
    real<lower=0> r;
    vector[T] dL0;   
}
transformed parameters {
    vector[T] mu;
    matrix[Nsubj, T] Idt;
    vector[T] dL0_star;
    
    for(j in 1:T) {
        for(i in 1:Nsubj) {
            
            Idt[i, j] = Y[i, j] * exp(beta[1]*pscenter[i] + beta[2]*hhcenter[i] + beta[3]*ncomact[i] 
                        + beta[4]*rleader[i] + beta[5]*dleader[i] + beta[6]*inter1[i] + beta[7]*inter2[i]) * dL0[j];
        }     
          dL0_star[j]= r * (t[j + 1] - t[j]);
        mu[j] <- dL0_star[j] * c;   
    }
} 
model {
    for(j in 1:T) {
        for(i in 1:Nsubj) {
            dN[i, j] ~ poisson(Idt[i, j]);  
        }     
        dL0[j] ~ gamma(mu[j], c);
    }
    
    c ~ gamma(0.0001, 0.00001);
    r ~ gamma(0.001, 0.0001);
    beta ~ normal(0.0, 100000);
}"

data <- read_rdump('survival.data.r')
inits <- list(c1=list(beta=c(-.36,-.26,-.29,-.22,-.61,-9.73,-.23),  c=0.01, r=0.01, 
    dL0=c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)))
fit <- stan(model_code=cox_model, data=data, init=inits, iter=2000, chains=1, verbose=F)

data:

github.com

fonnesbeck/stan_workshop_2016/blob/master/code/survival.data.r

T <- 73
Nsubj <- 430
t <- c(1, 21, 85, 128, 129, 148, 178, 204,
         206, 210, 211, 212, 225, 238, 241,
         248, 259, 273, 275, 281, 286, 289,
         301, 302, 303, 304, 313, 317, 323,
         344, 345, 349, 350, 351, 355, 356,
         359, 364, 385, 386, 389, 390, 391,
         392, 394, 395, 396, 397, 398, 399,
         400, 406, 415, 416, 426, 427, 434,
         435, 437, 441, 447, 448, 449, 450,
         451, 453, 455, 456, 458, 459, 460,
         461, 462, 463)
obs_t <- c(460, 313, 435, 350, 435, 350, 350, 460, 460, 448, 225, 225, 396, 435, 396, 396, 
    453, 396, 456, 397, 397, 396, 395, 275, 449, 395, 395, 462, 302, 302, 458, 461, 396, 241, 389, 
    458, 304, 304, 395, 395, 364, 460, 415, 463, 396, 459, 441, 435, 396, 458, 437, 396, 356, 356, 
    396, 455, 396, 462, 399, 400, 350, 350, 395, 395, 441, 355, 85, 458, 128, 396, 386, 386, 386, 462, 
    458, 390, 390, 396, 396, 396, 427, 458, 395, 275, 275, 395, 359, 395, 395, 441, 395, 463, 178, 275, 
    463, 396, 396, 259, 396, 396, 458, 441, 396, 463, 396, 463, 435, 396, 437, 396, 398, 463, 460, 462, 
    460, 460, 210, 396, 435, 458, 385, 323, 323, 359, 396, 396, 460, 238, 441, 450, 392, 458, 396, 458,

This file has been truncated. show original

code: https://github.com/fonnesbeck/stan_workshop_2016/blob/master/code/survival.r

hhau · August 21, 2018, 5:52am

Need a lower bound on dl0: vector <lower = 0> [T] dL0;

Lilia_Feng · August 21, 2018, 4:22pm

Thanks. I tried it before, and it did not help fix the problem.

sakrejda · August 21, 2018, 5:06pm

If you try things that correct an error but fail to solve your problem please update your reproducible example or make a new reply with the complete new reproducible example.
Without those updated, anybody who tries to volunteer to help you is wasting their time which will quickly limit the responses you get.

Lilia_Feng · August 21, 2018, 6:37pm

Sorry that I forgot to mention it. I tried to set the lower bound on dL0.

vector<lower = 0>[T] dL0;

and also tried replace 0 by a very small number.

vector<lower = 0.000001> [T] dL0;

Both of them failed to solve the problem.

bgoodri · August 21, 2018, 6:56pm

Well, it solves one of your problems.

Remaining problems include the BUGS priors and not using the poisson_log form of the likelihood, which necessitates writing Idt[i,j] as

            Idt[i, j] = log(Y[i, j]) + (beta[1]*pscenter[i] + beta[2]*hhcenter[i] + beta[3]*ncomact[i] 
                        + beta[4]*rleader[i] + beta[5]*dleader[i] + beta[6]*inter1[i] + beta[7]*inter2[i]) + log(dL0[j]);

Then it runs, but there are 961 divergent transitions out of 1000 post-warmup iterations, so there is still a long way to go to make this example viable.

Bob_Carpenter · August 22, 2018, 10:28am

Anything that worked before should work now.

The only thing Stan does in the background is add the log Jacobian of the constraining transforms for any constrained variable. All the density calculations are on the log scale, so when you write y ~ poisson(lambda) it has the same effect as target += poisson_lpmf(y | lambda) (modulo dropping some constants that aren’t needed for sampling or optimization).

Lilia_Feng · August 22, 2018, 3:42pm

Thanks, Bob. Sorry I am still learning Stan. Do you mean that if I have this lambda that follows Poisson distribution, I should use
y ~ poisson_log(lambda)
instead of
y ~ poisson(lambda)?

bgoodri · August 22, 2018, 5:43pm

No, you need the logarithm of lambda in the same way I redefined Idt[i,j] in log-units above.

to-mi · August 23, 2018, 9:24am

Note that we already went through one iteration of trying to come up with solutions to this in an earlier thread. It would be good to include the link to the previous thread for clarity. Anyway, one problem I think the model might have is that Y[i, j]s can contain zeros (as you mention here), although I don’t know if this will completely solve the issues. Here’s my earlier post about this: Survival analysis: initial values rejected

Bob_Carpenter · August 23, 2018, 9:57am

Look in the manual. It has the definitions of all of these functions.

y ~ poisson_log(alpha)

is equivalent to

y ~ poisson(exp(alpha))

In general, poisson_lpdf(y | alpha) = poisson_log_lpdf(y | log(alpha)), and both are increment the target with \log \mathsf{Poisson}(y \mid \alpha).

Lilia_Feng · August 28, 2018, 2:40pm

Thanks, Tomi, I tried to exclude those 0s, and it did start sampling. and the result “kind of” match with frequentist method. By “kind of”, I mean, there are 7 betas estimates, beta1 is off but the remaining 6 betas all match up. And I got this warning that there are more than 1000 out of 2000 divergent transitions.

Lilia_Feng · August 28, 2018, 2:43pm

Hi, Ben, thanks for your reply. I did try to rewrite the code in your way, and it did start sampling as you mentioned. but the thing is the estimates did not match with frequentist method, but it matched the estimates given by taking absolute value of “Idt” as follows:
instead of

I did
dN[i, j] ~ poisson(fabs(Idt[i, j]));

And I am not quite sure what is going on here, since Jags can give me really good estimates, and it is very similar to frequentist method.

bgoodri · August 28, 2018, 7:37pm

You want to do

dN[i,j] ~ poisson_log(log_of_ldf[i, j]);

in Stan but you have to compute the logarithm of ldt[i,j] in a numerically stable way.

jjramsey · August 28, 2018, 8:29pm

IIRC, taking the absolute value of a parameter is a bad idea in Stan, because that can lead to discontinuities of the gradient of the log-posterior, so a statement like “dN[i, j] ~ poisson(fabs(Idt[i, j]))” is ill-advised.

Lilia_Feng · August 29, 2018, 2:49pm

Thanks, Yeah, I know taking the absolute value is not the solution, and I am doing it just because I want to see what is going on, and help me to define the problem. This whole thing is just weird to me, because taking the absolute value can make it sample but apparently “Idt” could not be a negative number based on my model.

Lilia_Feng · August 29, 2018, 2:51pm

Yes, that’s what I did, I took log of Idt, and let it follow possion_log. It can sample but the estimates of betas are not correct. Can you explain a little more about how to write Idt in a numerically stable way? Thanks

bgoodri · August 29, 2018, 7:56pm

I did before

 log_Idt[i, j] = log(Y[i, j]) + (beta[1]*pscenter[i] + beta[2]*hhcenter[i] + beta[3]*ncomact[i] 
               + beta[4]*rleader[i] + beta[5]*dleader[i] + beta[6]*inter1[i] + beta[7]*inter2[i]) + log(dL0[j]);

Lilia_Feng · August 30, 2018, 7:40pm

Oh, that’s what I did. Sorry that I thought you were talking about something else. Thanks! but this way did not work for this case, and it gave the different estimates from frequentist method.

Bob_Carpenter · September 3, 2018, 10:59am

The point of Bayesian estimation is not to produce (penalized) maximum likelhood estimates. If you want to do that, just use the methods you’re comparing against.

A Bayesian posterior is about joint probability distributions of parameters, not point estimation. In the limit, as N \rightarrow \infty in well-behaved models (with concentration of measure in the posterior), the maxium likelihood estimates and posterior means will be the same.

The way to test if a model’s implementation is working is to simulate fake data and test.

[Edit: I forgot to mention that you can see this with any asymmetric distribution. For example, the mean of a random variable distributed as \mathsf{Beta}(\alpha, \beta) has a mean of \alpha / (\alpha + \beta) whereas its mode (max) is at (\alpha - 1) / (\alpha + \beta - 2) when it exists. Clearly these have the same limit as \alpha + \beta \rightarrow \infty, but they can be far apart in finite examples.]

Topic		Replies	Views
Survival analysis: initial values rejected Modeling	23	1765	August 11, 2018
Problem with using rstan General	40	2010	September 11, 2018
Failure to start because of initial values Modeling	16	3533	July 31, 2017
MCMC sampling does not work when execute Modeling fitting-issues	78	11398	November 1, 2017
Rejecting initial value: Error evaluating the log probability at the initial value Modeling fitting-issues	4	2658	July 7, 2018

Does Stan take log in the background

Related topics