Does Stan take log in the background

Hi, I am working on an survival analysis model, and used an example online (link posted below) . The professor who posted it said when he wrote this code in 2016 with older version of Stan, it worked without any problems. but with the newest version of Stan, I got the “initial value rejected” error. I wrote out Jags code, and it can sample successfully. I have been working on this example for weeks. Feel like a burnout. I have came to ask here, and got some suggestions. tried several ways to make it start sampling.

In this mode, Y is a matrix that contains only 0 or 1. dL0 follows gamma distribution. exp() won’t be a negative number. Therefore, Idt should be greater or equal to 0.

 Idt[i, j] = Y[i, j] * exp(beta[1]*pscenter[i] + beta[2]*hhcenter[i] + beta[3]*ncomact[i]  + beta[4]*rleader[i] + beta[5]*dleader[i] + beta[6]*inter1[i] + beta[7]*inter2[i]) * dL0[j];

However, if the absolute value is set around Idt, it starts sampling. This is weird to me. Apparently Stan did something in the background, but not sure what it is.

 dN[i, j] ~ poisson(fabs(Idt[i, j]));  

Or, if I exclude the 0s, it can also start sampling.

if(Y[i,j]==1){
dN[i, j] ~ poisson(Idt[i, j]); } 

I am really confused here. Could anyone help me explain what Stan did to make it difficult to sample? Any suggestions are appreciated, thanks!

library(ggplot2)
library(plyr)
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

cox_model <- 
"data {
    int<lower=0> Nsubj; 
    int<lower=1> T;
    vector[Nsubj] pscenter;
    vector[Nsubj] hhcenter;
    int<lower=0,upper=1> ncomact[Nsubj];
    int<lower=0,upper=1> rleader[Nsubj];
    int<lower=0,upper=1> dleader[Nsubj];
    vector[Nsubj] inter1;
    vector[Nsubj] inter2;
    int<lower=0,upper=1> FAIL[Nsubj];
    int<lower=0> obs_t[Nsubj];
    int<lower=0> t[T+1];
    
} 
transformed data {
    int<lower=0> Y[Nsubj, T];
    int<lower=0> dN[Nsubj, T];
    
   
    for(i in 1:Nsubj) {
        for(j in 1:T) {
            Y[i,j] =int_step(obs_t[i] - t[j]);
            dN[i,j] =Y[i, j] * int_step(t[j + 1] - obs_t[i]) * FAIL[i];
        }
    }
}
parameters {
    vector[7] beta;
    real<lower=0> c;
    real<lower=0> r;
    vector[T] dL0;   
}
transformed parameters {
    vector[T] mu;
    matrix[Nsubj, T] Idt;
    vector[T] dL0_star;
    
    for(j in 1:T) {
        for(i in 1:Nsubj) {
            
            Idt[i, j] = Y[i, j] * exp(beta[1]*pscenter[i] + beta[2]*hhcenter[i] + beta[3]*ncomact[i] 
                        + beta[4]*rleader[i] + beta[5]*dleader[i] + beta[6]*inter1[i] + beta[7]*inter2[i]) * dL0[j];
        }     
          dL0_star[j]= r * (t[j + 1] - t[j]);
        mu[j] <- dL0_star[j] * c;   
    }
} 
model {
    for(j in 1:T) {
        for(i in 1:Nsubj) {
            dN[i, j] ~ poisson(Idt[i, j]);  
        }     
        dL0[j] ~ gamma(mu[j], c);
    }
    
    c ~ gamma(0.0001, 0.00001);
    r ~ gamma(0.001, 0.0001);
    beta ~ normal(0.0, 100000);
}"

data <- read_rdump('survival.data.r')
inits <- list(c1=list(beta=c(-.36,-.26,-.29,-.22,-.61,-9.73,-.23),  c=0.01, r=0.01, 
    dL0=c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)))
fit <- stan(model_code=cox_model, data=data, init=inits, iter=2000, chains=1, verbose=F)

data:


code: https://github.com/fonnesbeck/stan_workshop_2016/blob/master/code/survival.r

Need a lower bound on dl0: vector <lower = 0> [T] dL0;

2 Likes

Thanks. I tried it before, and it did not help fix the problem.

If you try things that correct an error but fail to solve your problem please update your reproducible example or make a new reply with the complete new reproducible example.
Without those updated, anybody who tries to volunteer to help you is wasting their time which will quickly limit the responses you get.

Sorry that I forgot to mention it. I tried to set the lower bound on dL0.

vector<lower = 0>[T] dL0;

and also tried replace 0 by a very small number.

vector<lower = 0.000001> [T] dL0;

Both of them failed to solve the problem.

Well, it solves one of your problems.

Remaining problems include the BUGS priors and not using the poisson_log form of the likelihood, which necessitates writing Idt[i,j] as

            Idt[i, j] = log(Y[i, j]) + (beta[1]*pscenter[i] + beta[2]*hhcenter[i] + beta[3]*ncomact[i] 
                        + beta[4]*rleader[i] + beta[5]*dleader[i] + beta[6]*inter1[i] + beta[7]*inter2[i]) + log(dL0[j]);

Then it runs, but there are 961 divergent transitions out of 1000 post-warmup iterations, so there is still a long way to go to make this example viable.

1 Like

Anything that worked before should work now.

The only thing Stan does in the background is add the log Jacobian of the constraining transforms for any constrained variable. All the density calculations are on the log scale, so when you write y ~ poisson(lambda) it has the same effect as target += poisson_lpmf(y | lambda) (modulo dropping some constants that aren’t needed for sampling or optimization).

Thanks, Bob. Sorry I am still learning Stan. Do you mean that if I have this lambda that follows Poisson distribution, I should use
y ~ poisson_log(lambda)
instead of
y ~ poisson(lambda)?

No, you need the logarithm of lambda in the same way I redefined Idt[i,j] in log-units above.

Note that we already went through one iteration of trying to come up with solutions to this in an earlier thread. It would be good to include the link to the previous thread for clarity. Anyway, one problem I think the model might have is that Y[i, j]s can contain zeros (as you mention here), although I don’t know if this will completely solve the issues. Here’s my earlier post about this: Survival analysis: initial values rejected

Look in the manual. It has the definitions of all of these functions.

y ~ poisson_log(alpha)

is equivalent to

y ~ poisson(exp(alpha))

In general, poisson_lpdf(y | alpha) = poisson_log_lpdf(y | log(alpha)), and both are increment the target with \log \mathsf{Poisson}(y \mid \alpha).

Thanks, Tomi, I tried to exclude those 0s, and it did start sampling. and the result “kind of” match with frequentist method. By “kind of”, I mean, there are 7 betas estimates, beta1 is off but the remaining 6 betas all match up. And I got this warning that there are more than 1000 out of 2000 divergent transitions.

Hi, Ben, thanks for your reply. I did try to rewrite the code in your way, and it did start sampling as you mentioned. but the thing is the estimates did not match with frequentist method, but it matched the estimates given by taking absolute value of “Idt” as follows:
instead of

I did
dN[i, j] ~ poisson(fabs(Idt[i, j]));

And I am not quite sure what is going on here, since Jags can give me really good estimates, and it is very similar to frequentist method.

You want to do

dN[i,j] ~ poisson_log(log_of_ldf[i, j]);

in Stan but you have to compute the logarithm of ldt[i,j] in a numerically stable way.

IIRC, taking the absolute value of a parameter is a bad idea in Stan, because that can lead to discontinuities of the gradient of the log-posterior, so a statement like “dN[i, j] ~ poisson(fabs(Idt[i, j]))” is ill-advised.

1 Like

Thanks, Yeah, I know taking the absolute value is not the solution, and I am doing it just because I want to see what is going on, and help me to define the problem. This whole thing is just weird to me, because taking the absolute value can make it sample but apparently “Idt” could not be a negative number based on my model.

Yes, that’s what I did, I took log of Idt, and let it follow possion_log. It can sample but the estimates of betas are not correct. Can you explain a little more about how to write Idt in a numerically stable way? Thanks

I did before

 log_Idt[i, j] = log(Y[i, j]) + (beta[1]*pscenter[i] + beta[2]*hhcenter[i] + beta[3]*ncomact[i] 
               + beta[4]*rleader[i] + beta[5]*dleader[i] + beta[6]*inter1[i] + beta[7]*inter2[i]) + log(dL0[j]);

Oh, that’s what I did. Sorry that I thought you were talking about something else. Thanks! but this way did not work for this case, and it gave the different estimates from frequentist method.

The point of Bayesian estimation is not to produce (penalized) maximum likelhood estimates. If you want to do that, just use the methods you’re comparing against.

A Bayesian posterior is about joint probability distributions of parameters, not point estimation. In the limit, as N \rightarrow \infty in well-behaved models (with concentration of measure in the posterior), the maxium likelihood estimates and posterior means will be the same.

The way to test if a model’s implementation is working is to simulate fake data and test.

[Edit: I forgot to mention that you can see this with any asymmetric distribution. For example, the mean of a random variable distributed as \mathsf{Beta}(\alpha, \beta) has a mean of \alpha / (\alpha + \beta) whereas its mode (max) is at (\alpha - 1) / (\alpha + \beta - 2) when it exists. Clearly these have the same limit as \alpha + \beta \rightarrow \infty, but they can be far apart in finite examples.]