I’m trying to do something similar to a hierarchical logistic regression. For each binary label, I have a variable number of continuous predictors. Ideally, I’d also be able to make predictions for held out labels using only the continuous predictors.
Below is a simplified version of the model I came up with but it doesn’t seem right because it models both of the data equally, even though the label is the variable of interest. This way is helpful in that I can hold out labels and make predictions for them.
I keep trying to think of another way to have the label variable be drawn from a distribution that hte continuous variable also provides information about, but I can’t think of any that don’t boil down to the same thing…
Thank you!
data{
int <lower=1> N_di; // tot
// the data
int<lower=0, upper=1> label[N_di];
vector[N_di] cnts;
// priors etc
real cnts_sd_manual;
}
parameters{
vector<lower=0, upper=1>[N_di] p_di;
real<lower=0, upper=1> intercept; //
real<lower=0, upper=1> slope;
}
transformed parameters{
cnts_mean = intercept + slope * p_di;
}
model {
p_di ~ beta(1, 1);
slope ~ normal(0, .2);
intercept ~ normal(0,.2);
//models for the data
label ~ bernoulli(p_di);
target += normal_lpdf(cnts | cnts_mean, cnts_sd_manual);
}
You’re trying to model the label variable, right? (since it’s data, I don’t think you need to constrain it, probably just check before inputting it into Stan whether it’s really binary-only), but do you need to calculate the likelihood of the cnts variable?
Also, you are using p_di as a vector of parameters for each label without any additional model for it, and the same vector within a linear predictor. It seems like an unusual constraint.
So I’m not clear what it is exactly you are trying to achieve. In logistic regression you’d have an x axis that would go into the linear predictor and into the logistic model to get to the likelihood of the labels along that axis – the hierarchical part would come on top of that. Here you have a linear regression and a bernoulli likelihood side by side, disconnected in any meaningful way.
Correct me if I’m wrong, but here you are estimating a probability based on a single binary observation (many times, but each time one parameter for one observation), I’d say this would be mostly constrained by the cnts likelihood.
Maybe I’m missing some detail, but could you clarify a bit.
Thanks for the reply! Yes it is a bit unusual. I don’t know if it’s the best model for what I’m trying to do. The thing I’m interested in obtaining is the probability p_di for each observation. I expect if p_di is higher then label is more likely to be 1, and also if p_di is higher cnts is higher. If there was only a fixed set of continuous predictors per observation I could do a regular logistic regression to get p_di
But there can be a variable number of predictors per observation. So my idea was to link the two inputs using the same modeled parameter, but I’m not super satisfied with my solution.
Could you give an example of substantively what you’re trying to model? For example, what is label and what is conts in the real world? That might help clarify the issue.
Could you also elaborate what you mean by…
This seems like a separate issue from the example model, but maybe that is where you have simplified it.
Ok, thanks. The setup is something like this. We have a set of patients and we are trying to predict the probability (p_di) that a treatment will work (label). We are predicting this based on their similarity to other patients who the treatment worked for (conts, a measure of the similarity). We have as inputs only the similarity scores and the labels, and we have a variable number of the similarity scores (conts values) for each patient. Does that make sense? The simplification was I only included one predictor per patient in the sample code but it actually can be any number >= 1. Thank you!
Thanks, that’s very helpful. There’s something about the setup that doesn’t seem quite right to me, but I haven’t figured out what it is. I think it has to do with the idea that you’re only using similarity to individuals with successful treatment.
Do the different similarity measures align with specific individuals? For example, is it that have similarity_bob, similarity_joe, etc. corresponding to the similarity to Bob and Joe? If so, you might be able to do something akin to FIML in SEM (see Missing Data and Partially Known Parameters for a very simple example).
Sorry for the delay and thanks for the suggestion. I looked into the suggestion and I see your thinking, but I don’t think it makes sense to treat it as a missing data problem because there are for each treatment a different set of individuals who the treatment worked for, and there are over 100 people. Please let me know if any other ideas come to mind!
What you want is the posterior predictive for a single case, right? Intuitively, I’d think of that as a case where you’d want to have the logistic regression model of the predictors of treatment success in the model block, and then move the posterior prediction of treatment success for a case to generated quantities. I’d then want to simulate with bernoulli_rng in generated quantities, conditional on the fitted parameters and the case-level predictors. That would give you a posterior predictive probability of treatment success for a given case. But I may be missing something here?
Thanks, this comment is in response to the part about making the predictions for held out labels right? I think I understand but I am not sure what would be the difference between making the prediction in the generated quantities block and in the model block, these lines were in the model block
Yes, that’s what I had in mind. One advantage would be that you could fit the model once, and then run standalone generated quantities to predict for new cases. But forget about bernoulli_rng, you would of course want the inverse logit to get a more interpretable posterior (#notastatistician).