How to solve that intermediate variable fails my working model

Dear all,

(I am using cmdstan in r.)

Thanks for your attention first of all. I am writing this post to ask why creating an intermediate variable would fail my working model and how to solve the issue. Specifically, I have a basic model which works perfectly as below:

data{
    int<lower=1> N;
    int<lower=1> N_papers;
    vector[N] coef;
    array[N] int<lower = 1, upper = N> paper_nr;
    vector<lower=0>[N] se;
}
parameters{
    vector<lower=0>[N] coefhat;
    real<lower=0> df;
    vector<lower=0>[N_papers] coefpaper;
    real<lower=0> sigma_paper;
    real mu_l;
    real<lower=0> sigma_l;
}
model{
    // priors:
    sigma_paper ~ normal( 0 , 5 );
    mu_l ~ normal( 1 , 5 );
    sigma_l ~ normal( 0 , 5 );
   
    coef ~ normal( coefhat , se );
    coefhat ~ student_t(df , coefpaper[paper_nr] , sigma_paper );
    coefpaper ~ lognormal( mu_l , sigma_l );
}

Now, as I want to add a linear regression part to the model, I have to create an intermediate variable as the dependent variable. To check the validity, I thus initially adapt my basic model as below (please note the commented line is the final version I want to achieve, where Q is the transformed matrix of independent variables):

data{
    int<lower=1> N;
    int<lower=1> K;
    int<lower=1> N_papers;
    vector[N] coef;
    array[N] int<lower = 1, upper = N> paper_nr;
    vector[N] se;
}
parameters{
    vector<lower=0>[N] coefhat;
    real<lower=0> df;
    vector<lower=0>[N_papers] coefpaper;
    real<lower=0> sigma_paper;
    real mu_l;
    real<lower=0> sigma_l;
}
model{
    vector[N] omega;
    sigma_l ~ normal( 0 , 5 );
    sigma_paper ~ normal( 0 , 5 );
    mu_l ~ normal( 1 , 5 );
    
    coef ~ normal( coefhat , se );
    // omega = Q * beta_tilde + coefpaper[paper_nr];
    omega = coefpaper[paper_nr];
    coefhat ~ student_t(df , omega, sigma_paper );
    omega ~ lognormal( mu_l , sigma_l );
}

Now, the model has difficulty in fitting (the four chains are not mixed). I have tried to adjust initial values but it does not work.

Could you please share your knowledge of this case and tell me any potential approach I can attempt to solve it?

Thank you very much.
Elon

Could you post the code for the entire model? It will make it easier to understand what is happening in your model block.

1 Like

Hi Corey,

I have updates the details. Could you please take another look over it?

Thank you.

Best,
Jilong

Have you tried using a print() statement to see what omega actually looks like? It seems that you are making omega a vector of length N, but defining its value as equal to the coefpaper parameter for only those entries of 1:N that are indexed by the integer array paper_nr.

I am not sure how Stan handles the way you’ve defined omega, but a print() statement would be revealing.

(If this is, for instance, producing nan values in the omega vector where omega is not defined, then these nan values would be read as data into target += lognormal_lcdf(omega | mu_l, sigma_l). Those nan values would also be read in as the mu parameters for those target += student_t_lcdf(coefhat | df, omega, sigma_paper) where omega is not defined.)

Someone more familiar with Stan’s backend could tell you exactly what is going to happen with omega (and I cannot), but if you’re looking to investigate it yourself while you’re waiting for a response, that is where I would start.

1 Like

Hi Corey,

Thanks for your suggestion, and I indeed checked how omega enters into the model estimation.

To do so, I firstly move

vector<lower=0>[N] omega;
omega = coefpaper[paper_nr]

to the transformed parameter block so that I can print it from the stan fit outcome (It seems that the stan fit does not store the intermediate variables generated in the model block).

data{
    int<lower=1> N;
    int<lower=1> N_papers;
    vector[N] coef;
    array[N] int<lower = 1, upper = N_papers> paper_nr;
    vector[N] se;
}
parameters{
    vector<lower=0>[N] coefhat;
    real<lower=0> df;
    vector<lower=0>[N_papers] coefpaper;
    real<lower=0> sigma_paper;
    real mu_l;
    real<lower=0> sigma_l;
}
transformed parameters{
    vector<lower=0>[N] omega;
    omega = coefpaper[paper_nr];
    // omega = Q * beta_tilde + coefpaper[paper_nr];
}
model{  
    sigma_paper ~ normal( 0 , 5 );
    mu_l ~ normal( 1 , 5 );
    sigma_l ~ normal( 0 , 5 );
    
    coef ~ normal( coefhat , se );
    coefhat ~ student_t(df , omega, sigma_paper);
    omega ~ lognormal( mu_l , sigma_l );
}

And then I found that N estimated omegas are the same, which is not what I expect. So, it seems that vectorization computation in the transformed parameters block fails. I then opt for loop way as below:

transformed parameters{
    vector<lower=0>[N] omega;
    for (n in 1:N) {
    omega[n] = coefpaper[paper_nr[n]];
    }
    // omega = Q * beta_tilde + coefpaper[paper_nr];
}

However, this also does not work, the generated omegas are still the same. Any thoughts over this?

Warm regards,
Jilong

I’m not sure, but this is where print() statements come in handy. You need to know where the value of omega is coming from in the first place. I would put a print() statement on omega and coefpaper and compare them. You can do so inside of the loop, for instance. You could also do so in the model block and see if omega keeps the value from the transformed parameters block.

It’s not clear to me if you are thinking about lognormal as omega’s prior where it is not defined by the improper uniform distribution of coefpaper, or of omega as a parameter fed into a lognormal likelihood. If it’s the latter, consider using target += lognormal_lcdf(omega | mu_l, sigma_l) instead of omega ~ lognormal(mu_l, sigma_l). And, if it is the latter, note that the omega vector is only assigned a value where coefpaper is indexed into by paper_nr.

Also, if you’re not interested in knowing what omega is outside of trying to diagnose this issue, you don’t need it to be a transformed parameter – you can just investigate it with print() statements and leave it in the model block.

Is the model you want to fit the one with the Q * beta_tilde? If so, how do you declare Q and beta_tilde?

If both Q and beta_tilde are parameters, then you have problems with multiplicative non-identifiability and you’ll get fits of different signs (for instance, flipping sign on both Q and beta_tilde). Pinning one of them to be positive will solve the sign problem, but it won’t solve the scaling problem (you can divide Q by c and multiply beta_tilde by c and get the same answer. You’ll also need to account for the change of variables if you want to put a prior on omega rather than on Q and beta_tilde directly.

1 Like

That’s right. The only difference between a transformed parameter and a local variable in the model block is that transformed parameters are saved and are available in the generated quantities block.

1 Like

Hi Bob,

Thanks for your attention.

My final intention is to have a regression on the second distribution (the student-t distribution). Q is the transformed matrix for the covariates matrix x and beta_tilda is the transformed coefficient vector (I will get the original ones by taking vector[K] beta = R_inv*beta_tilde at the last).

transformed data{
    real s = sqrt(N - 1.0);
    matrix[K,K] R = qr_thin_R(x)/s;
    matrix[N,K] Q = qr_thin_Q(x)*s;
    matrix[K,K] R_inv = inverse(R);
}

As they are not related to my current question, I did not show them while only keeping the the commented-out line omega = Q * beta_tilde + coefpaper[paper_nr]; to explain why I want to have such an intermediate variable omega.

It sent me off track guessing that Q was a parameter. Rather than continuing to guess, could you please provide the full model definition that you are trying to fit in a single place so I don’t have to guess how it fits together?

P.S. Inverses are very unstable in linear algebra, and rather than taking inverse(R) * beta_tilde, it’s more stable to use matrix division, R \ beta_tilde, which produces the same answer. It may not be efficient if you could reuse the inverse(R) and may not be a problem if your R is well conditioned.

Hi Bob,

My apologies about the inconvinience. Please refer to the full model below, and some potentially useful context info here. I want to do a meta-anlysis for a specific parameter X from literature. My model has three levels: coef and se are the data collected, which is assumed to follow a normal distribution with the true effect X hat to be estimated. As a paper can give multiple estimates (e.g., different subsets of data), I assume these X hat of the same paper follow a student-t distribution, whose location parameter is assumed to depend on a series of covariates. Finally, I assume these paper-level means (declared as omega) follow a log-normal distribution.

data{
    int<lower=1> N;
    int<lower=1> K;
    int<lower=1> N_papers;
    vector[N] coef;
    array[N] int<lower = 1, upper = N> paper_nr;
    vector[N] se;
    matrix[N,K] x;
}
transformed data{
    real s = sqrt(N - 1.0); 
    matrix[K,K] R = qr_thin_R(x)/s;
    matrix[N,K] Q = qr_thin_Q(x)*s;
    matrix[K,K] R_inv = inverse(R);
}
parameters{
    vector<lower=0>[N] coefhat;
    real<lower=0> df;
    vector<lower=0>[N_papers] coefpaper;
    real<lower=0> sigma_paper;
    real mu_l;
    real<lower=0> sigma_l;
    vector[K] beta_tilde;
}
model{
    vector[N] omega;
    sigma_l ~ normal( 0 , 5 );
    sigma_paper ~ normal( 0 , 5 );
    mu_l ~ normal( 1 , 5 );
    
    coef ~ normal( coefhat , se );
    omega = Q * beta_tilde + coefpaper[paper_nr];
    coefhat ~ student_t(df , omega , sigma_paper );
    omega ~ lognormal( mu_l , sigma_l );
}
generated quantities{
    real mu;
    mu = exp(mu_l + (sigma_l^2)/2); // mu
  
    real sigma;
    sigma = (exp(sigma_l^2) - 1) * exp(2*mu_l + sigma_l^2);
    
    array[N] real coefpp;
    coefpp = student_t_rng(df , coefpaper[paper_nr] , sigma_paper );  
  
    vector[K] beta = R_inv*beta_tilde; //obtain coefficients on original scale
}

I hope this is clear enough for your checking.

BTW, it would be very helpful if you could temporarily ignore this linear regression part aside, and tell why the next first model works and give reasonable estimates while the second gives coefpapers that are very very close to each other in the vector, which thereby resulting in almost parallel chains of sigma_l and mu_l in traceplot.

model{
    sigma_l ~ normal( 0 , 5 );
    sigma_paper ~ normal( 0 , 5 );
    mu_l ~ normal( 1 , 5 );
    
    coef ~ normal( coefhat , se );
    coefhat ~ student_t(df , coefpaper[paper_nr], sigma_paper );
    coefpaper ~ lognormal( mu_l , sigma_l );
}
model{
    vector[N] omega;
    sigma_l ~ normal( 0 , 5 );
    sigma_paper ~ normal( 0 , 5 );
    mu_l ~ normal( 1 , 5 );
    
    coef ~ normal( coefhat , se );
    omega = coefpaper[paper_nr];
    coefhat ~ student_t(df , omega, sigma_paper );
    omega ~ lognormal( mu_l , sigma_l );
}

To me, they are essentially the same, No? Thank you.