Implementing two-step estimation

rbecerril · July 23, 2021, 6:58pm

I am trying to implement a two-step estimator to account for my DV being discretized. In the first step, I need to estimate the mean av_y and st. dev. \sigma_y of the underlying normal distribution using the midpoints of the bins y_{mp} for N observations. I do that with something like this:

model {
  ymp ~ normal( avy, sigmay); 
  avy ~ normal(muavy, sigmaavy); 
  sigmay ~ cauchy(musigmay, sigmasigmay);  
  // then some hierarichal priors for muavy and sigmaavy
}

Then I need to combine these estimates with the bin edges X, X_{+1} to create a new dependent variable
y_{EC} = av_y + \sigma_y \frac{\phi( (X_n - av_y) / \sigma_y ) - \phi( (X_n^{+1} - av_y) / \sigma_y ) } { \Phi( (X_n^{+1} - av_y) / \sigma_y ) - \Phi( (X_n - av_y) / \sigma_y ) }
and estimate the second-step parameters (say \alpha and \sigma_\zeta) that account for the uncertainty in the estimates of av_y and \sigma_y. Let’s say for simplicity that the second-stage model is y_{EC} \sim N(\alpha, \sigma_\zeta). What I have tried last is

transformed parameters{
  real yEC[N];
  for (n in 1:N) {
     yEC[n] = avy + sigmay *
                ( exp(normal_lpdf((X[y[n]  ] - avy)/sigmay | 0, 1)) -
                  exp(normal_lpdf((X[y[n]+1] - avy)/sigmay | 0, 1)) ) / 
                ( exp(normal_lcdf((X[y[n]+1] - avy)/sigmay | 0, 1)) -
                  exp(normal_lcdf((X[y[n]  ] - avy)/sigmay] | 0, 1)) )
}
model {
    target += normal_lpdf( avy | muavy, sigmaavy);
    target += normal_lpdf( sigmay | musigmay, sigmasigmay);
    target += normal_lpdf( yEC  | alpha , sigmazeta);        
      
  // priors for parameters alpha and sigmazeta

The true model is quite more complicated but I believe this code gets to the core of my question.

My problem is that the av_y estimated in the second stage have a mean way lower than the muavy and this problem does not happen if I use the original data to estimate the second step directly (pretending the bin midpoints are the actual data). I originally used one single program to estimate both steps, but got the bias. Then ran the first step, and processed the chains to compute muavy, sigmaavy, musigmay, sigmasigmay and passed as data to the second step. The bias survives.

I suspect that the likelihood of y_{EC} \sim N(\alpha, \sigma_\zeta) is pulling the mean of av_y towards zero. That is, I cannot fully separate the two models.

Thanks in advance.

stevebronder · July 23, 2021, 7:30pm

Apologies if I’m misreading, but this sounds like a measurement error model? The docs below may be useful if so

rbecerril · July 23, 2021, 7:50pm

@stevebronder, thanks for the prompt reply and for the recommendation. This indeed can be regarded as a measurement error model. I tried before to use data augmentation to estimate the true underlying value of my DV, but I could not get significant estimates. I suspect that the bins are narrow enough that the distribution within each bin is quite flat and therefore the uncertainty around the imputed values is just too high. Basically, I do not have enough information and cannot use the distribution shape to identify the underlying values.

So, instead, I thought of estimating y_{EC}, the expected value of y conditional on the bin, and then using that as a data point (accounting for the spread of its distribution). In the frequentist world, that generates consistent estimates. See:

Stewart, Mark B. “On least squares estimation when the dependent variable is grouped.” The Review of Economic Studies 50, no. 4 (1983): 737-753.

I realize I am mixing consistency with Bayesian methods, but I have not found a better option yet.

mike-lawrence · July 24, 2021, 2:12am

Oh! If I understand this right, then you might save yourself some time by using Stan’s ordered distributions. If you know the cutpoints that define the bins then you submit these as data along with the bin label for each observation, leaving you to do inference on the eta parameter, which I think you can then hack to achieve inference on both the location and scale of a latent normal. Something like:

data{
    int n_bins_plus_2 ; //plus2 bc first and last bins go to -Inf/+Inf, respectively 
    vector[n_bins_plus_2-1] cutpoints ; 
    int n_obs ;
    array[n_obs] int<lower=1,upper= n_bins_plus_2 > bin_for_obs ; // remember that no observation should have the label “1”
}
parameters{
    real mu ;
    real<lower=0> sigma ;
}
model{
    bin_for_obs ~ ordered_logistic( mu/sigma , cutpoints ) ;
}

And then, should you have some other observed data X that you want to include as a predictor to assess the relationship between X and your binned outcome, you can do:

data{
    int n_bins_plus_2 ;
    vector[n_bins_plus_2-1] cutpoints ;
    int n_obs ;
    array[n_obs] int<lower=1,upper= n_bins_plus_2 > bin_for_obs ;
    vector[n_obs] X ;
}
parameters{
    vector[2] mu ; // intercept and effect-of-Y on mu
    vector[2] log_sigma ; // intercept and effect-of-Y on sigma
}
model{
    //priors would go here
    bin_for_obs ~ ordered_logistic( 
        (mu[1]+mu[2]*X) 
        ./  
        exp(sigma[1]+sigma[2]*X) 
        , cutpoints 
    ) ;
}

Similarly, if you have the above plus some other observed data variable Y and you want to let the count data and X jointly inform on Y, you can use a structural-equation-style latent variable model:

data{
    int n_bins_plus_2 ;
    vector[n_bins_plus_2-1] cutpoints ;
    int n_obs ;
    array[n_obs] int<lower=1,upper= n_bins_plus_2 > bin_for_obs ;
    vector[n_obs] X ;
    vector[n_obs] Y ;
}
parameters{
    vector[3] mu ;
    vector[3] <lower=0> sigma ;
    vector[n_obs] Z ; 
    real<lower=0> Z_X ; // must be positive for identifiability
    real<lower=-1, upper=1> Z_bin; 
    real Z_Y;
    vector[n_obs] bin_unique ; 
}
model{
    // other priors would go here
    bin_unique ~ std_normal() ;  // must be std_normal for identifiability
    Z ~ std_normal() ;  // must be std_normal for identifiability
    X ~ normal( mu[1] + Z*Z_X , sigma[1] ) ;
    bin_for_obs ~ ordered_logistic( 
        mu[2] + (
          Z*Z_bin + bin_unique*(1-Z_bin)
        )*sigma[2]
        , cutpoints 
    ) ;
    Y ~ normal( mu[3] + Z*Z_Y , sigma[3] ) ;
}

What do you think @stevebronder ?

JLC · July 24, 2021, 2:38am

This is really helpful for a number of applications!

If I can ask one noob question; should the ‘model’ block be the ‘parameters’ block and vice versa? I

mike-lawrence · July 24, 2021, 2:39am

Ha, good catch, mixed them up somehow 🤦‍♂️

Edited my post on o fix it and avoid confusing others.

rbecerril · July 24, 2021, 1:52pm

I am afraid I sacrificed important parts of my problem description for the sake of conciseness and clarity. The model I tried to describe is the first equation of a system, which in its simplest, original form looks like
y_{it}=\pi_{it}+\zeta_{it} + \epsilon_{it}
x_{it} = \phi \zeta_{it} + \psi \epsilon_{it} + \xi_{it},
where \pi and \zeta are latent variables, i indexes observation units and t indexes time.

This model is identified when y is a continuous variable. But in my data y is discretized. I actually did try to code the first equation as an interval regression, which is I believe what you are proposing. I had to desist because I need to recover some estimate of \epsilon to estimate the second equation and I was not able to do that. I suspect discretization removes the information associated with \epsilon. Please correct me if I am wrong and it is possible to extend your code to draw an estimate of \epsilon.

In the meantime, I will use the interval regression you propose to estimate the mean and spread of the distribution of y.

mike-lawrence · July 24, 2021, 11:44pm

Have you confirmed this with simulations? I feel like even in the case of no-binning of y, \zeta and \epsilon are under-identified thanks to their additive role in both eqns. The fact that they have equal weight for y and possibly-unequal weights in x helps, but under circumstances where x also supports equal weights, they’ll certainly be under-identified.

rbecerril · July 25, 2021, 1:25am

I did not do the simulation, but I estimated the model using the midpoints of the bins for y. The chains converge and are stable if I impose a sign restriction on \phi (theory implies it should be positive). The estimates are also consistent with estimates published in a couple of papers for the same model with different data and a continuous y. One of those papers demonstrated identification theoretically for the case in which y is continuous.

Topic		Replies	Views
Another measurement error modeling question Modeling	13	657	July 5, 2023
Estimating measurement error model mean and variance from data Modeling specification	4	530	February 19, 2020
Error-in-variables regression with unobserved discrete predictors Modeling specification	11	1321	March 2, 2018
Binomial measurement error model Modeling specification , brms	11	346	July 30, 2024
Estimation, not marginalisation, of discrete parameters: a solution? General discrete-parameters	3	760	May 18, 2022

Implementing two-step estimation

Related topics