Partial Pooling Scale Parameter is Inf

batting_a_zero · March 6, 2021, 9:11pm

I’m brand new to stan and working on fitting my first model. I’m attempting to use partial pooling and fit individual slopes and intercepts for a variety of groups. The model seems to be converging fine and the output makes sense. I’m getting a few informational messages as soon as i start sampling the model which I’m not sure how to handle, or if I should worry about them.

I’m also getting lp__ with an rhat of 1.33, which is a little surprising since it’s only an rhat that is not near 1 for this one parameter, so not sure if i need to worry about this either.

messages:

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: student_t_lpdf: Scale parameter is inf, but must be finite!  (in 'unknown file name' at line 55)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

This is my model

 data {
         int<lower=1> N; //number of data points
         int<lower=1> ID;  //number of unique groups
         int id[N]; 
         real x[N];     
         real z[N];  
         real y[N];  
      }
      
      parameters {
         real beta[ID];
         real beta_2[ID];   
         real beta_2_sq[ID];   
         real beta_3[ID];   
         real<lower=0> eps; 
         real<lower=0> nuMinusOne ;

         real beta_mu;
         real beta_2_mu ;
         real beta_2_sq_mu ;
         real<lower=0, upper=1> beta_3_mu ;
         real<lower=0> beta_sigma;
         real<lower=0> beta_2_sigma ;
         real<lower=0> beta_2_sq_sigma ;
         real<lower=0> beta_3_sigma ;

      }

      transformed parameters {
          real<lower=0> nu;
          nu = nuMinusOne+1;
      }

      model {
         beta_mu ~ normal(0, 10);
         beta_2_mu  ~ normal( 0 , 10 ) ;
         beta_2_sq_mu  ~ normal(-.5, 10 ) ;
         beta_3_mu  ~ normal(.5 , 1) ;

         beta_sigma ~ uniform(1.0E-2, 1.0E+2);
         beta_2_sigma  ~ uniform( 1.0E-2 , 1.0E+2) ;
         beta_2_sq_sigma  ~ uniform( 1.0E-2 , 1.0E+2) ;
         beta_3_sigma  ~ uniform( 1.0E-2 , 2) ;
         eps ~ uniform( 1.0E-2 , 1.0E+3 ) ;
         nuMinusOne  ~ exponential(1/29.0) ;

         beta ~ normal(beta_mu, beta_sigma);
         beta_2 ~ normal( beta_2_mu , beta_2_sigma ) ; // vectorized
         beta_2_sq ~ normal( beta_2_sq_mu , beta_2_sq_sigma ) ; // vectorized
         beta_3 ~ normal( beta_3_mu , beta_3_sigma ) ; // vectorized

         for (i in 1:N){
            y[i] ~ student_t(nu,
            beta[id[i]] + beta_2[id[i]] * x[i] + beta_2_sq[id[i]] * x[i]*x[i] + z[i]*beta_3[id[i]],
            eps);
            }
      }

Max_Mantei · March 6, 2021, 10:13pm

Hey there! Welcome to the Stan forum! :)

As long as they are few and during warm-up this should be fine.

This is not so good.

Do you have any other warnings or convergence issues? Most importantly are there any divergent transitions?

You can improve your model code in a few places, though. So maybe let’s start there and see the issues go away. I’ll just list some things and my version of the model code.

You can precompute x[i]*x[i] in the transformed data block as real x_sq[N] = square(x); (can be vectorized, I think).
You are using a lot of Uniform distributions explicitly, which you don’t have to do. Just provide the upper and lower bounds in the parameters block and then a Uniform distribution is implied.
Usually Uniform priors (even with upper bound) are not the best choice for scale parameters in multi-level models. Half-Normal, Half-Student-t or Exponential priors often work well.
The priors seem to be pretty wide. Often tighter priors improve model fit considerably, because of their regularizing effect.
I’m pretty sure you can vectorize the “likelihood” part (y[i] ~ ...) of your model.
If there are any divergent transitions, you might want to try non-centered parameterization (NCP) for your multi-level parameters. I’ll not code this, because you said your fit is mainly fine. Note, that usually a lot of people they have to use NCP for their multi-level models – but that depends on the model and the data…
You can declare and assign variables on one line. Doesn’t really make a difference, but looks nicer. ;)
(More like a personal preference of mine…) You can use arrays of reals for your data (x, z, y) and parameters (beta*[ID]), but I tend to prefer vectors. Often you can use either. I’ll rewrite the code with vectors, but that’s not really an issue.

data {
  int<lower=1> N; //number of data points
  int<lower=1> ID;  //number of unique groups
  int id[N]; 
  vector[N] x;     
  vector[N] z;  
  vector[N] y;  
}
transformed data{
  vector[N] x_sq = square(x);
}
parameters {
  vector[ID] beta;
  vector[ID] beta_2;   
  vector[ID] beta_2_sq;   
  vector[ID] beta_3;   
  real<lower=0> eps; 
  real<lower=0> nuMinusOne ;

  real beta_mu;
  real beta_2_mu ;
  real beta_2_sq_mu ;
  real<lower=0, upper=1> beta_3_mu ;
  real<lower=0> beta_sigma;
  real<lower=0> beta_2_sigma ;
  real<lower=0> beta_2_sq_sigma ;
  real<lower=0> beta_3_sigma ;
}
transformed parameters {
  real<lower=0> nu = nuMinusOne + 1; // can be a one-liner
}
model {
  beta_mu ~ normal(0, 10);
  beta_2_mu  ~ normal( 0 , 10 ) ;
  beta_2_sq_mu  ~ normal(-.5, 10 ) ;

  beta_3_mu  ~ normal(.5 , 1) ;
  // Personally, I would remove the upper/lower bound on this and just give this
  // prior a smaller sd, maybe like
  // beta_3_mu  ~ normal(0.5 , 0.5) ;
  // You can of course still have it with bounds, but maybe then a beta prior works better? 
  // Something like:
  // beta_3_mu ~ beta(2, 2); 

  beta_sigma ~ student_t(5, 0, 10); // still fairly wide prior
  beta_2_sigma ~ student_t(5, 0, 10); // still fairly wide prior
  beta_2_sq_sigma ~ student_t(5, 0, 10); // still fairly wide prior
  beta_3_sigma ~ student_t(5, 0, 10); // still fairly wide prior
  eps ~ student_t(3, 0, 15); // pretty wide prior
  nuMinusOne  ~ exponential(1/29.0);

  beta ~ normal(beta_mu, beta_sigma);
  beta_2 ~ normal( beta_2_mu , beta_2_sigma ) ; // vectorized
  beta_2_sq ~ normal( beta_2_sq_mu , beta_2_sq_sigma ) ; // vectorized
  beta_3 ~ normal( beta_3_mu , beta_3_sigma ) ; // vectorized

  y ~ student_t(nu, beta[id] + beta_2[id] * x + beta_2_sq[id] * x_sq + z * beta_3[id], eps);
  // can also put the linear predictor in a local variable on top of the model block, like
  // vector[N] mu = beta[id] + beta_2[id] * x + beta_2_sq[id] * x_sq + z * beta_3[id];
  // and then just do
  // y ~ student_t(nu, mu, eps);

}

Maybe this’ll make it work a bit better. I have a feeling that tighter priors will alleviate the Rhat issue on the __lp, since those should lead to better “identification” of the model.

Cheers,
Max

batting_a_zero · March 7, 2021, 12:02am

Max_Mantei:

data {
  int<lower=1> N; //number of data points
  int<lower=1> ID;  //number of unique groups
  int id[N]; 
  vector[N] x;     
  vector[N] z;  
  vector[N] y;  
}
transformed data{
  vector[N] x_sq = square(x);
}
parameters {
  vector[ID] beta;
  vector[ID] beta_2;   
  vector[ID] beta_2_sq;   
  vector[ID] beta_3;   
  real<lower=0> eps; 
  real<lower=0> nuMinusOne ;

  real beta_mu;
  real beta_2_mu ;
  real beta_2_sq_mu ;
  real<lower=0, upper=1> beta_3_mu ;
  real<lower=0> beta_sigma;
  real<lower=0> beta_2_sigma ;
  real<lower=0> beta_2_sq_sigma ;
  real<lower=0> beta_3_sigma ;
}
transformed parameters {
  real<lower=0> nu = nuMinusOne + 1; // can be a one-liner
}
model {
  beta_mu ~ normal(0, 10);
  beta_2_mu  ~ normal( 0 , 10 ) ;
  beta_2_sq_mu  ~ normal(-.5, 10 ) ;

  beta_3_mu  ~ normal(.5 , 1) ;
  // Personally, I would remove the upper/lower bound on this and just give this
  // prior a smaller sd, maybe like
  // beta_3_mu  ~ normal(0.5 , 0.5) ;
  // You can of course still have it with bounds, but maybe then a beta prior works better? 
  // Something like:
  // beta_3_mu ~ beta(2, 2); 

  beta_sigma ~ student_t(5, 0, 10); // still fairly wide prior
  beta_2_sigma ~ student_t(5, 0, 10); // still fairly wide prior
  beta_2_sq_sigma ~ student_t(5, 0, 10); // still fairly wide prior
  beta_3_sigma ~ student_t(5, 0, 10); // still fairly wide prior
  eps ~ student_t(3, 0, 15); // pretty wide prior
  nuMinusOne  ~ exponential(1/29.0);

  beta ~ normal(beta_mu, beta_sigma);
  beta_2 ~ normal( beta_2_mu , beta_2_sigma ) ; // vectorized
  beta_2_sq ~ normal( beta_2_sq_mu , beta_2_sq_sigma ) ; // vectorized
  beta_3 ~ normal( beta_3_mu , beta_3_sigma ) ; // vectorized

  y ~ student_t(nu, beta[id] + beta_2[id] * x + beta_2_sq[id] * x_sq + z * beta_3[id], eps);
  // can also put the linear predictor in a local variable on top of the model block, like
  // vector[N] mu = beta[id] + beta_2[id] * x + beta_2_sq[id] * x_sq + z * beta_3[id];
  // and then just do
  // y ~ student_t(nu, mu, eps);

}

Hey, thanks for taking a look at this. I’ve been working on implementing your changes, but running the model string you sent is giving an error when attempting to compile it.

ValueError: Failed to parse Stan model 'anon_model_b91f6181c79088d068c0bb2c08ee6626'. Error message:
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for: 
  vector * vector
Available argument signatures for operator*:
  real * real
  vector * real
  row_vector * real
  matrix * real
  row_vector * vector
  vector * row_vector
  matrix * vector
  row_vector * matrix
  matrix * matrix
  real * vector
  real * row_vector
  real * matrix
No matches for: 
  vector + ill-formed
Available argument signatures for operator+:
  int + int
  real + real
  vector + vector
  row_vector + row_vector
  matrix + matrix
  vector + real
  row_vector + real
  matrix + real
  real + vector
  real + row_vector
  real + matrix
  +int
  +real
  +vector
  +row_vector
  +matrix
No matches for: 
  vector * vector
Available argument signatures for operator*:
  real * real
  vector * real
  row_vector * real
  matrix * real
  row_vector * vector
  vector * row_vector
  matrix * vector
  row_vector * matrix
  matrix * matrix
  real * vector
  real * row_vector
  real * matrix
No matches for: 
  ill-formed + ill-formed
Available argument signatures for operator+:
  int + int
  real + real
  vector + vector
  row_vector + row_vector
  matrix + matrix
  vector + real
  row_vector + real
  matrix + real
  real + vector
  real + row_vector
  real + matrix
  +int
  +real
  +vector
  +row_vector
  +matrix
No matches for: 
  vector * vector
Available argument signatures for operator*:
  real * real
  vector * real
  row_vector * real
  matrix * real
  row_vector * vector
  vector * row_vector
  matrix * vector
  row_vector * matrix
  matrix * matrix
  real * vector
  real * row_vector
  real * matrix
No matches for: 
  ill-formed + ill-formed
Available argument signatures for operator+:
  int + int
  real + real
  vector + vector
  row_vector + row_vector
  matrix + matrix
  vector + real
  row_vector + real
  matrix + real
  real + vector
  real + row_vector
  real + matrix
  +int
  +real
  +vector
  +row_vector
  +matrix
Expression is ill formed.
 error in 'unknown file name' at line 58, column 85
  -------------------------------------------------
    56:   beta_3 ~ normal( beta_3_mu , beta_3_sigma ) ; // vectorized
    57: 
    58:   y ~ student_t(nu, beta[id] + beta_2[id] * x + beta_2_sq[id] * x_sq + z * beta_3[id], eps);
                                                                                            ^
    59:   // can also put the linear predictor in a local variable on top of the model block, like
  -------------------------------------------------

I tried to play around with it a bit and put the mu in the transformed parameter block, but still no luck

batting_a_zero · March 7, 2021, 12:31am

I went back to a for loop just to get things going and to see if things converged. You were 100% correct that a tighter prior would fix the issues.

I’m still getting these message at the beginning.

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: student_t_lpdf: Scale parameter is inf, but must be finite!  (in 'unknown file name' at line 53)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: student_t_lpdf: Scale parameter is inf, but must be finite!  (in 'unknown file name' at line 53)
If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

I’m really confused how that’s even possible. It sounds like it’s an issue with nu, but it seems to be setup correctly as far as I can tell. I also hardcoded nu as what it was converging to but I still got the messages.

I would like to try NCP. I coded it before i posted this, not totally sure it was completley correct and had the same error as above with syntax error for the vectors.

Max_Mantei · March 7, 2021, 7:21pm

Ohh… sorry! Yeah, I got that wrong. In this version you’d need element-wise vector-vector multiplication using .*. So this expression:

beta[id] + beta_2[id] * x + beta_2_sq[id] * x_sq + z * beta_3[id]

should really be like this:

beta[id] + beta_2[id] .* x + beta_2_sq[id] .* x_sq + z .* beta_3[id]

I don’t have much time right now, I’ll answer tomorrow, okay?

Cheers,
Max

batting_a_zero · March 7, 2021, 10:14pm

Perfect, that did it. I’m surprised it fits so much faster with it being vectorized. I didn’t really expect that kind of improvement.

And really no rush at all on the NCP. Thanks so much for all the help with this.

Max_Mantei · March 8, 2021, 9:20am

Hey there!

I forgot one thing! Your model will most likely fit much better when you center your variables. This is particularly true for the x_sq variable. Correlation between regressors can sometimes be a problem for Stan and centering may help.

transformed data{
  vector[N] x_sq = square(x) - mean(square(x)); // not the most efficient, but 'transformed data' is just evaluated once... 
}

Centering the other predictors would also help. Note that this changes the interpretation of the models intercept and should thus change your prior for it if it is at least weakly informative.

NCP

I’m not sure which Stan version you are using, but I think it should be possible to use offsets and multipliers. The general is to just declare the parameters as:

parameters {
  real<lower=0> eps; 
  real<lower=0> nuMinusOne ;

  real beta_mu;
  real beta_2_mu ;
  real beta_2_sq_mu ;
  real<lower=0, upper=1> beta_3_mu ;
  real<lower=0> beta_sigma;
  real<lower=0> beta_2_sigma ;
  real<lower=0> beta_2_sq_sigma ;
  real<lower=0> beta_3_sigma ;

  vector<offset = beta_mu, multiplier = beta_sigma>[ID] beta;
  vector<offset = beta_2_mu, multiplier = beta_2_sigma>[ID] beta_2;   
  vector<offset = beta_2_sq_mu, multiplier = beta_2_sq_sigma>[ID] beta_2_sq;   
  vector<offset = beta_3_mu, multiplier = beta_3_sigma>[ID] beta_3;   
}

Note that I had to change the order (offset and multiplier have to be declared before you can use them). You can also do NCP for just some of the variables – you’ll have to test if that is necessary – just remove the offset and multiplier.

Cheers,
Max

batting_a_zero · March 9, 2021, 3:23pm

Max_Mantei:

  beta ~ normal(beta_mu, beta_sigma);
  beta_2 ~ normal( beta_2_mu , beta_2_sigma ) ; // vectorized
  beta_2_sq ~ normal( beta_2_sq_mu , beta_2_sq_sigma ) ; // vectorized
  beta_3 ~ normal( beta_3_mu , beta_3_sigma ) ; // vectorized

So that worked with the offsets. Just to verify, i do want to drop

  beta ~ normal(beta_mu, beta_sigma);
  beta_2 ~ normal( beta_2_mu , beta_2_sigma ) ; // vectorized
  beta_2_sq ~ normal( beta_2_sq_mu , beta_2_sq_sigma ) ; // vectorized
  beta_3 ~ normal( beta_3_mu , beta_3_sigma ) ; // vectorized

from the model block, right? the model compiles and samples fine either

Max_Mantei · March 9, 2021, 3:28pm

Hey there!

No, you have to leave it in. The offset and multiplier just shift and scale, but you still should specify a prior distribution. So just leave it in there.

Great! :)

Cheers,
Max

batting_a_zero · March 9, 2021, 4:07pm

No, you have to leave it in. The offset and multiplier just shift and scale, but you still should specify a prior distribution. So just leave it in there.

Ah, that makes sense. Thanks again for all the help! This has made a world of difference

Topic		Replies	Views
Help with structuring a (simple) random slope hierarchical model in Stan Modeling	8	1500	July 11, 2019
Metropolis proposal rejected because location parameter is infinite Modeling	2	2293	November 6, 2018
Partially pooled beta binomial model Modeling	9	1813	April 12, 2019
Exception: integrate_ode_rk45: parameter vector[1] is inf, but must be finite! Modeling ode	1	1274	April 14, 2020
Exceptions returned: "wishart_lpdf: random variable is not symmetric" and "gamma_lpdf: Random variable is inf, but must be positive finite!" Modeling fitting-issues , specification , cmdstanr	1	843	September 22, 2023

Partial Pooling Scale Parameter is Inf

NCP

Related topics