Implementing two-step estimation

mike-lawrence · July 24, 2021, 2:12am

Oh! If I understand this right, then you might save yourself some time by using Stan’s ordered distributions. If you know the cutpoints that define the bins then you submit these as data along with the bin label for each observation, leaving you to do inference on the eta parameter, which I think you can then hack to achieve inference on both the location and scale of a latent normal. Something like:

data{
    int n_bins_plus_2 ; //plus2 bc first and last bins go to -Inf/+Inf, respectively 
    vector[n_bins_plus_2-1] cutpoints ; 
    int n_obs ;
    array[n_obs] int<lower=1,upper= n_bins_plus_2 > bin_for_obs ; // remember that no observation should have the label “1”
}
parameters{
    real mu ;
    real<lower=0> sigma ;
}
model{
    bin_for_obs ~ ordered_logistic( mu/sigma , cutpoints ) ;
}

And then, should you have some other observed data X that you want to include as a predictor to assess the relationship between X and your binned outcome, you can do:

data{
    int n_bins_plus_2 ;
    vector[n_bins_plus_2-1] cutpoints ;
    int n_obs ;
    array[n_obs] int<lower=1,upper= n_bins_plus_2 > bin_for_obs ;
    vector[n_obs] X ;
}
parameters{
    vector[2] mu ; // intercept and effect-of-Y on mu
    vector[2] log_sigma ; // intercept and effect-of-Y on sigma
}
model{
    //priors would go here
    bin_for_obs ~ ordered_logistic( 
        (mu[1]+mu[2]*X) 
        ./  
        exp(sigma[1]+sigma[2]*X) 
        , cutpoints 
    ) ;
}

Similarly, if you have the above plus some other observed data variable Y and you want to let the count data and X jointly inform on Y, you can use a structural-equation-style latent variable model:

data{
    int n_bins_plus_2 ;
    vector[n_bins_plus_2-1] cutpoints ;
    int n_obs ;
    array[n_obs] int<lower=1,upper= n_bins_plus_2 > bin_for_obs ;
    vector[n_obs] X ;
    vector[n_obs] Y ;
}
parameters{
    vector[3] mu ;
    vector[3] <lower=0> sigma ;
    vector[n_obs] Z ; 
    real<lower=0> Z_X ; // must be positive for identifiability
    real<lower=-1, upper=1> Z_bin; 
    real Z_Y;
    vector[n_obs] bin_unique ; 
}
model{
    // other priors would go here
    bin_unique ~ std_normal() ;  // must be std_normal for identifiability
    Z ~ std_normal() ;  // must be std_normal for identifiability
    X ~ normal( mu[1] + Z*Z_X , sigma[1] ) ;
    bin_for_obs ~ ordered_logistic( 
        mu[2] + (
          Z*Z_bin + bin_unique*(1-Z_bin)
        )*sigma[2]
        , cutpoints 
    ) ;
    Y ~ normal( mu[3] + Z*Z_Y , sigma[3] ) ;
}

What do you think @stevebronder ?

Topic		Replies	Views
Two-steps estimation in poisson model Modeling rstan , fitting-issues	11	632	May 11, 2021
Running time problem Modeling	3	676	June 7, 2017
Issue with model and transformed parameters Modeling specification , performance	7	1620	May 23, 2017
Join two cluster in a mixed distribution Modeling	3	654	July 22, 2017
Code efficiency wrt the transformed parameters blog Modeling techniques	9	621	October 25, 2019

Implementing two-step estimation

Related Topics