Set initial values between 0,1 log link and other errands


I am fitting a model using as link function the logit. Moreover, one of my covariates (cov_1_for_y) for the outcome y has some values unobserved, and I am imputing those when unobserved using a logit as well.

This model works; however, what I would like to do is to move from logit links to log links for the model of the outcome of interest (y) as well as for the imputation model.

Question 1. How can I set the initial values to be between 0,1 when I am using the log link. I am getting the same error described here: How to set initial value to make probability parameter be in the interval [0, 1] in a stan model but I do not feel confident enough to adapt the response to my own problem. I put a comment on that topic but nobody responded so now I will try as a new topic.

Question 2. How do I correctly move to log link in the case of the imputation piece. For the modeling part seems to work just fine when I use only complete data; however, I get the error from question 1 when I expand the model to have more covariates. I have left commented pieces of code on how I am doing the log link between the y outcome and the covariates as well as the way I “think” the log link should be accomplished in the imputation piece.

Feedback is really appreciated

modelstring <- "

int N; //number of observations
int L; //number of unique groups 1

int y[N]; // outcome

//covariate with some values unobserved
int cov_1_for_y[N]; //  categorical variable: 0 if baseline category, 1 otherwise, (-1) if unobserved
int cov_1_for_y_miss[N]; // categorical variable when observed == 1, 0 otherwise

int cov_2_for_y[N]; //covariate with all values observed

// index of unique groups 1
int index_groups_1[N];

//data to impute cov_1_for_y
real cov_imp[N];


real <lower=-10,upper=10> alpha;
real <lower=-10,upper=10> beta_cov_1_for_y; // coefficient 
real <lower=-10,upper=10> beta_cov_2_for_y; // coefficient 

real <lower=-10,upper=10> a_imp; // intercept for the imputation model 
real <lower=-10,upper=10> b_cov_imp; // coefficent for the imputation model 

real<lower=0,upper=100> sigma_theta_groups_1; // standard deviation across groups 1 - specific intercepts
vector[L] epsilon_groups_1;
vector[L] mu;


transformed parameters {
vector[L] theta_groups_1;

theta_groups_1 = mu + sigma_theta_groups_1 * epsilon_groups_1;


// priors
alpha ~ normal(0,10); // explained above
beta_cov_1_for_y ~ normal(0,10); // explained above
beta_cov_2_for_y ~ normal(0,10); // explained above
a_imp ~ normal(0,10); // explained above
b_cov_imp ~ normal(0,10); // explained above


sigma_theta_groups_1~cauchy(0,15); //standard deviation across groups 1 specific intercepts

for(l in 1:L){

//add Data 
for (i in 1:N) {

vector[2] p;
p[1] = a_imp + b_cov_imp*cov_imp[i]; // modeling the prob 2 as a function of the covariate to model the category

if (cov_1_for_y_miss[i] == 0) {
y[i] ~  bernoulli_logit(alpha+

// log link
//y[i] ~  bernoulli(exp(alpha+
//                      beta_cov_1_for_y*cov_1_for_y[i]+
//                      beta_cov_2_for_y*cov_2_for_y[i]+
//                      theta_groups_1[index_groups_1[i]]));

cov_1_for_y[i] ~ bernoulli(softmax(p)[1]);

//log link?
//cov_1_for_y[i] ~ bernoulli(exp(p[1]));


// x missing cov_1_for_y_miss
else {

vector[2] log_lik_cats; // vector to hold the log probabilities for each alternate scenario (category 1, 2, or 3)

log_lik_cats[1] = log_softmax(p)[1] + bernoulli_logit_lpmf( y[i] | alpha + beta_cov_1_for_y + beta_cov_2_for_y*cov_2_for_y[i] + theta_groups_1[index_groups_1[i]]);  // category 1: recently weaned
log_lik_cats[2] = log_softmax(p)[1] + bernoulli_logit_lpmf( y[i] | alpha + beta_cov_2_for_y*cov_2_for_y[i] + theta_groups_1[index_groups_1[i]]);  // category 0: not recently weaned

//log link?
//log_lik_cats[1] = log(p[1]) + bernoulli_lpmf( y[i] | exp(alpha + beta_cov_1_for_y + beta_cov_2_for_y*cov_2_for_y[i] + theta_groups_1[index_groups_1[i]]));  // category 1: recently weaned
//log_lik_cats[2] = log(p[1]) + bernoulli_lpmf( y[i] | exp(alpha + beta_cov_2_for_y*cov_2_for_y[i] + theta_groups_1[index_groups_1[i]]));  // category 0: not recently weaned

target += log_sum_exp(log_lik_cats); // sum log probabilities across the scenarios (i.e., marginalize over missingness)
} // close model block 

Since nobody else answered, I will give it a try.

I honestly do not understand what are you trying to achieve with the log link - would you care to elaborate? The thing is: The parameter of the Bernoulli distribution has to be in the interval [0,1], but the domain of the exp function is all positive numbers - so unless you make some very complex restrictions to your model coefficients, you will get invalid values. That is why people use the logit link - it never produces invalid values. If logit is problematic for some reason, I would consider other functions that map between [0,1] and \mathbb{R} (e.g.

Some minor points:

  • real <lower=-10,upper=10> alpha; hard bounds like this are generally discouraged in favor of soft bounds, i.e. if you put N(0,5) prior on alpha it will very strongly pull alpha values to stay within [-10,10]
  • sigma_theta_groups_1~cauchy(0,15); this is an extremely wide prior do you have justification for it?
  • beta_cov_1_for_y ~ normal(0,10); when working on the log (or logit) scales this is also extremely wide prior - it says that effects of \exp(2 \times 10) \simeq 5 \times 10^8 are a priori plausible.

Hope that helps!

1 Like

Hi Martin:

First of all, thank you for your interest in my problem.

I will answer and asks stuff in the same order:

I am aware of the issues with the log link. There are 2 reasons to use it. First, My outcome is common for some of the strata, which means that if I want to present my results with prevalence ratios (cross sectional study) I need to convert to logit exp(a + bx)/(1+exp(a + bx)). Becasue the disease is common, it is quite described this method will overestimate the prevalences (or risk if this would be a cohort or experimental study). The most common option to skip this problem described in Epi is the log link specially with categorical covariates only as in my case. Leaving some references: The only reason to prioritize prevalence ratios instead of OR is the intuitive interpretation of the former. The discussion about the problems with the log link have been discussed here as well: Log binomial random intercept model

I checked the wikipedia link. Great resource! thank you. I have 2 concerns about moving to another functions: the typical interpretation of coefficients and secondly, the behavior of the estimated p considering again, the common outcome. In the first case, I do not want to use something that is unfamiliar and nobody understands. The second one, I have no idea what would be the behavior of the posterior pred distribution given a common outcome using these alternatives.

Yes! I definetely take the advice on the priors.
the cauchy comes from here
Although I can definetely takes suggestions on this.

Thanks again

I have little to add to what Mike and Bob said in the Log binomial post you linked to - I thing they summarize the problem really well. I will just emphasize that you can:

  • You can make predictions/estimates for prevalence ratios based on a logistic regression fit - you only need to know the values/ranges of covariates for the cases you want to predict/estimate. So if you just want to estimate the prevalence ratios in your already gathered dataset, it is straightforward.
  • You could in principle design math constraints on parameters to make the log link work, (you need a + b * x < 0) I am not sure it makes sense, but the easiest way would be to have real a_raw<upper = 0>; as a parameter and then compute a = -b*x + a_raw. This might not necessarily behave well in practice though.
  • The logit link behaves almost like a log link for negative values of the linear predictor and starts behaving noticeably differently only above -2 or so. You could design a link function that mirrors the log up to say -0.2 and only then starts to deviate to not exceed 1. This could be easier than designing the math constraints and could have very similar effect on your inferences. It would however be challenging to interpret what the link function means (“mostly prevalance ratios but then something different for some values”).
data.frame(x = seq(-5,1, length.out = 100)) %>% mutate(f_exp = exp(x), f_inv_logit = 1 / (1 + exp(-x))) %>% gather("func", "value", -x) %>% ggplot(aes(x=x,y=value, color=func)) + geom_line() + geom_hline(yintercept = 1, color = "green", linetype = "dashed")