Censored data with survey weights in mixture model

I am modeling income data from the US Census CPS AESC survey as a two component exponential-log normal mixture model, where the bulk of the data (including the right tail) is in the log-normal component. The data is top coded at 150000 and the data contains survey weights which are non-integer values. The stan model I’m using is:

data {
  int<lower=0> N_obs;
  int<lower=0> N_cens;
  real y_obs[N_obs];
  real<lower=max(y_obs)> U;
  vector<lower=0>[N_obs] weights;  // survey weights 
}
parameters {
  real<lower=U> y_cens[N_cens];
  real<lower=0,upper=1> lambda;     // mixing proportions
  real<lower=0> mu;     // location of lognormal
  real<lower=0> sigma;  // scale of lognormal
  real<lower=0> alpha;  // scale of exponential
} 
model {
for (n in 1:N_obs) {
    target += weights[n] * log_sum_exp(log(lambda) + exponential_lpdf(y_obs[n] | alpha),
    log1m(lambda) + lognormal_lpdf(y_obs[n] | mu, sigma));
}
    y_cens ~ lognormal(mu, sigma);
}

The survey weight for income \geq 150000 is 465.6025 which I’m rounding up to an integer value for N_cens such that the data list is:

data_list = list(y_obs = WSAL_VAL, N_obs = length(WSAL_VAL), U=150000, N_cens = 466, weights=MARSUPWT)

I’m getting expected results, but am curious if anyone has a suggestion for a better way of approaching the problem of survey weights with censored data or if this is a reasonable approach. Thanks.

Andrew has written a bunch on questions such as this that basically say weight after-the-fact (using poststratification) rather than weighting the likelihood with survey design weights. Here is the most recent article (although I guess in your case you know the cluster sizes)
http://www.stat.columbia.edu/~gelman/research/published/clustersampling.pdf

1 Like