Changing regression response variable from gaussian to logistic

You won’t get a realization of G in transformed parameters, you’ll get it as a parameter.

parameters{
  vector[n_obs] G;
}
transformed parameters}
  vector[n_obs] ystar = X*beta + G;
}
model{
 G  ~ multi_normal_cholesky(rep_vector(0, n_obs), sigma * cholesky_decompose(V));
}
1 Like

Ah ok 👌 got it. I will experiment!

1 Like

So I modified the model to this:

parameters {
    real<lower=0.0> sigma;
    real<lower=0.0> lambda;
    vector[n_cov] beta;
    vector[n_obs] G;
    vector[n_obs] e;

    vector<lower=0.0>[n_exp] r;
}

transformed parameters{
  vector[n_obs] ystar = X*beta + G + e;
}

model {
    // local params
    matrix[n_obs, n_obs] V;

    V = add_diag( lambda * exp(-cov_Kzr(Z, r)) , 1.0 ) ;

    // Priors
    sigma ~ inv_gamma(1, 0.25);
    lambda ~ inv_gamma(10, 10);
    r ~ cauchy(0, 0.1);
    beta ~ std_normal();
    e ~ std_normal();

    // Likelihood
    //ystar ~ multi_normal_cholesky(X*beta, sigma * cholesky_decompose(V));
    G  ~ multi_normal_cholesky(rep_vector(0, n_obs), sigma * cholesky_decompose(V));
    y ~ bernoulli_logit( ystar );

}

It seems to fit better in that I don’t get divergences or treedepth warnings, however I do get EFMI warnings:

I tried different priors on e, but anything but std_normal() seemed to poison the model. I’m also wondering if its ok just to discard e as I’m doing. Should it be added back in somewhere?

Is this the approach you intended? Just wondering if I understood correctly.

Edit: I added a group level intercept and it seems to improve things a bit:

parameters {
...
    real Intcpt;
}

transformed parameters{
  vector[n_obs] ystar = Intcpt + X*beta + G + e;
}

@jroon Just want to throw out there that this code below doesn’t give me crazy level of convergence issues, although sigma is pretty unidentified still:

data {
  int<lower=0> N;
  vector[N] x;
  array[N] int<lower=0, upper=1> y;
}

parameters {
  real alpha;
  real beta;
  real<lower=0> sigma;
  vector[N] z;
}

transformed parameters {
  vector[N] y_star;
  
  y_star = alpha + beta*x + sigma*z;
}

model {
  y ~ bernoulli_logit(y_star);

  alpha ~ std_normal();
  beta ~ std_normal();
  sigma ~ std_normal();
  z ~ std_normal();
}

Some example simulation code:

set.seed(1234)

N <- 100
alpha <- 1
beta <- 0.5
sigma <- 0.1

x <- rnorm(N)
y_star <- rnorm(N, alpha + beta * x, sigma)
y <- rbinom(N, size=1, prob=plogis(y_star))

m <- cmdstanr::cmdstan_model(stan_file="bernoulli-gaussian.stan")

fit <- m$sample(
    data=list(N=N, x=x, y=y),
    iter_warmup=1000,
    iter_sampling=1000,
    parallel_chains=4,
)
1 Like

This model still includes the e term, but it should be tractable because you are not trying to estimate the variance of that term; you have fixed the variance to 1 by assumption. Is that what you intended?

More broadly, my question for you is (why) do you need the e term?

1 Like

Thanks @cgoold. Unless I’m missing something - this model is analogous to the model in my previous post, but missing the GP term?

You suggest a prior on the variance of sigma? Simply didn’t occur to me to be honest. I will add it and see what happens.

Well I don’t as such. Decomposing the GP into X*beta + G + e wasn’t something I knew I could do until you suggested it. What I’m interested in estimating is G (well, really V), and to a lesser extent beta. As such e is just something to help me fit the model to the binomial outcome data.

Get rid of the e! You don’t need it!

Ok I am confused! Do you mean remove it as a parameter completely? (in which case I’m confused because I only put it in at your suggestion), or do you mean do this:

y ~ bernoulli_logit( ystar  - e );

Sorry, I think we’ve been talking past each other for a while.

Your model from post #3 didn’t have an e term in it at all. However, your original model from post #1 did have an e term, as did your brms model from post #8. In post #11, I suggested an alternative brms model that doesn’t have the e term, but in post #12 you suggested that this model isn’t the one you want. I never understood why that would be the case.

We then spent several posts discussing how to isolate the e term from the gaussian process term, but my goal in stepping through this was to show why e would almost always cause problems when its variance is a parameter to be estimated. My advice would be to remove it.

In post #23, you show a model that includes the e term, but does not attempt to use the data to estimate the variance of that term; you fix the variance to 1 by construction in that model with e ~ std_normal();. That should remove the non-identifiability present in your models from posts #1 and #8, but I still see no reason to include this term.

1 Like

Hi. Yes you are right we were talking past each. Sorry in my efforts to simplify the model, I introduced things I didnt’ intend. Lets do a soft restart :)

So the model now is:

functions {
    matrix cov_Kzr(matrix Z, vector r) {
        int n_obs = dims(Z)[1];
        int n_exp = dims(Z)[2];
        matrix[n_obs, n_exp] Zr = Z .* rep_matrix(sqrt(r'), n_obs);
        matrix[n_obs, n_obs] K = rep_matrix(0.0, n_obs, n_obs);

        for (i in 1:n_obs){
            for (j in 1:n_obs){
                if (i < j){
                    for (k in 1:n_exp){
                        K[i, j] += square( Zr[i, k] - Zr[j, k] );
                    }
                    K[j, i] = K[i, j];
                }
            }
        }
        return K;
    }
    
    // Calculate distance/covariance matrix for exposures at new values 
    matrix cov_Kz2r(matrix Z1, matrix Z2, vector r) {
        // Define variables
        int n_obs = dims(Z1)[1];
        int n_exp = dims(Z1)[2];
        int nobs2 = dims(Z2)[1];
        matrix[n_obs, n_exp] Z1r = Z1 .* rep_matrix(sqrt(r'), n_obs);
        matrix[nobs2, n_exp] Z2r = Z2 .* rep_matrix(sqrt(r'), nobs2);
        matrix[n_obs, nobs2] K = rep_matrix(0.0, n_obs, nobs2);

        // calculate K - cycling through all rows because K may not be square here
        for (i in 1:n_obs){
            for (j in 1:nobs2){
                for (k in 1:n_exp){
                    K[i, j] += square( Z1r[i, k] - Z2r[j, k] );
                }
            }
        }
        return K;
    }
 
}

data {
    int n_obs;
    int n_exp;
    int n_cov;

    int y[n_obs];
    matrix[n_obs, n_exp] Z;
    matrix[n_obs, n_cov] X;
}

parameters {
    real<lower=0.0> sigma;
    real<lower=0.0> lambda;
    vector[n_cov] beta;
    vector[n_obs] G;
    //vector[n_obs] e;

    vector<lower=0.0>[n_exp] r;
}

transformed parameters{
  vector[n_obs] ystar = X*beta + G; // + e;
}

model {
    // local params
    matrix[n_obs, n_obs] V;

    V = add_diag( lambda * exp(-cov_Kzr(Z, r)) , 1.0 ) ;

    // Priors
    sigma ~ inv_gamma(1, 0.25);
    lambda ~ inv_gamma(10, 10);
    r ~ cauchy(0, 1.0);
    beta ~ std_normal();
    //e ~ std_normal();

    // Likelihood
    //ystar ~ multi_normal_cholesky(X*beta, sigma * cholesky_decompose(V));
    G  ~ multi_normal_cholesky(rep_vector(0, n_obs), sigma * cholesky_decompose(V));
    y ~ bernoulli_logit( ystar );
}

This version has resolved the diverengces I had to start with, but now I get chain mixing and ESS EFMI warnings: