Initialisation working differently to expected

I am trying to specify initial values for NUTS for the below model. I have tried to specify these in both rstan and pystan but am coming unstuck somewhere. Specifically, when I specify,

fit <- sampling(model, ..., init=function(){list(beta=rep(1, 25))})

I find that the chains do not begin near 1. This is despite there being non-zero posterior probability and finite gradients for beta = rep(1, 25): I have checked this in pystan. To be clear, I am using the following (in R) to extract the full chains (i.e. including warm-up):

temp <- rstan::extract(fit, permuted=F, inc_warmup=TRUE)

Does anyone have any idea where I am going wrong?

functions{
  real logistic_function(matrix x, int[] y, vector beta, real sigma, int N){
    real log_prob = 0.0;
    for(i in 1:N){
      log_prob += -log(1 + exp(-y[i] * dot_product(x[i], beta)));
    }
    log_prob += -1/(2 * sigma^2) * dot_product(beta, beta);
    return(log_prob);
  }
}
data{
  int N;
  int y[N];
  matrix[N, 25] x;
  real sigma;
}
parameters{
  vector[25] beta;
}

model{
  target += logistic_function(x, y, beta, sigma, N);
}

I think that is right, but the first row of temp is a value from the Hamiltonian process that was started at whatever the starting values were.

Thanks. The first row of temp is the below (which I can’t seem to make sense of):

beta[1] beta[2] beta[3] beta[4] beta[5] beta[6]
6.10903019 -2.21460302 0.75340430 -1.30991431 -0.38425056 -0.77346380
beta[7] beta[8] beta[9] beta[10] beta[11] beta[12]
-1.19342098 -0.49706950 -0.35299320 0.16353087 -0.75785689 -0.27273974
beta[13] beta[14] beta[15] beta[16] beta[17] beta[18]
-0.64186167 0.02548292 -0.26294919 0.51684128 1.05918257 -0.69769830
beta[19] beta[20] beta[21] beta[22] beta[23] beta[24]
0.72372475 1.12199262 2.11515626 0.72863217 1.09253213 1.23940066
beta[25] lp__
1.45721016 -516.37782847

I can’t see how these correspond with the init values I’m passing to sampling?

I’m saying it shouldn’t. The starting values you give are transformed to the unconstrained space and then it takes up to 10 leapfrog steps from there, choosing one of those -1 + 2^{10} footprints that it made along the way to be the first row of temp you see there (after retransforming it to the constrained space).

Ok, so the first values are actually those at the end of the first iteration? Rather than at the start of the first iteration that is.

Not sure I follow re: constrained here, since the beta parameters are unconstrained reals?

Yes. You can think of the starting values as time 0 and the first row of temp as time 1. In general, there are transformations back and forth, but I guess it is just an identity transformation in your case.

1 Like

Great, thanks very much. Both make sense!