Is it ok to use the same independent outcome variable twice in a model?

bennyb · June 4, 2024, 9:19pm

Your hunch re unbiased estimates makes sense.

I’ve started checking what you suggest (95% incl true value), but on my actual model instead of the simplified simulations and that is taking a while to fully run, but the first results indicate no issues.
The sample sizes for my actual model are around n = 50.

Interesting to hear about the results of your simulation though, I’ll keep that in mind when analyzing my results.

One reason I was coming back to the double likelihood approach is that @jsocolar’s suggestion started struggling when the coefficient is close to 0. It seems like that introduces unidentifiability between psi and the coefficient. I’ll try your integration approach as well though, maybe that’s more robust somehow.

bennyb · June 4, 2024, 10:29pm

The integration approach seems to do the trick, it recovers all parameters and doesn’t struggle when the coefficient is (close to) 0.

Here is the code I used (with a rough integration approach that I’m sure can be improved, but it does the job).

functions { 

              real normal_pdf(real x, real mu, real sigma) {
                     real coef = 1 / sqrt(2 * pi() * sigma * sigma);
                     real exponent = -0.5 * square((x - mu) / sigma);
                     return coef * exp(exponent);
              }

              real mean_logit(real lower_bound, real upper_bound, real alpha, real beta, real X_mu, real X_sigma, int steps) {
                     real step_size = (upper_bound - lower_bound) / steps;
                     real integral;

                     integral = 0.5 * step_size * normal_pdf(X_mu , X_mu, X_sigma) * inv_logit(alpha + X_mu * beta);

                     for(i in 1:(steps-1)){
                            real x = lower_bound + i * step_size;
                            integral += step_size * normal_pdf(x , X_mu, X_sigma) * inv_logit(alpha + x * beta);
                     }

                     return integral;
              }
       }


       data {
              int N; 
              int y[N];
              vector[N] X;
              int N_u;              
              int u[N_u];
       }

       parameters {
              real alpha;  
              real beta;
              real<lower=0,upper=1> psi;
              real X_mu;
              real X_sigma;
       }
       
       transformed parameters {
              vector<lower=0,upper=1>[N] mu;
              real<lower=0,upper=1> mu_unknown;
              
              mu = inv_logit(alpha + X * beta);
              mu_unknown = mean_logit(-5, 5, alpha, beta, X_mu, X_sigma, 100);
       }

       model {
              alpha ~ normal(0,5);
              beta ~ normal(0,5); 
              psi ~ beta(1,1);
              X_mu ~ normal(0,1);
              X_sigma ~ exponential(0.5);
              X ~ normal(X_mu, X_sigma);

              y ~ bernoulli( (1-psi) * mu);
              u ~ bernoulli(mu_unknown);
       }

Topic		Replies	Views
Multiple likelihoods with same outcome, shared parameter Modeling techniques	5	221	May 28, 2024
Double definition of response variable allowed? Modeling rstan , techniques	0	131	May 10, 2024
(when) can an outcome variable be used twice? Modeling techniques	0	96	May 14, 2024
Modelling joint likelihood of two outcomes & parameter correlation Modeling rstan	1	531	August 26, 2022
Using two models to estimate the missing data in an independent variable General rstan	3	329	August 11, 2022

Is it ok to use the same independent outcome variable twice in a model?

Related topics