GLM with extra linear constraints on the coefficiencts

techniques

#1

Hi all,

I am trying to rewrite a GLM model in Stan (based on a non-Bayesian model someone else in my company wrote in MATLAB many years ago). There are a set of multiple parameters that are all related – all of them are also 0-1 (but not necessarily exclusive).

In the MATLAB model, these parameters are subject to a constraint that sets their overall trend to be 0 (i.e. if one were to apply a linear regression to coefficients g1, g2, g3, g4, g5, (with x = 1, 2, 3, 4, 5), the slope would be 0). It does this by adding an extra row to the model matrix, with entries as 0 for every variable except for g1-g5. Those have entries c1-c5, set as \sum_{i=1}^{5}(c_i *g_i) = 0. They then wrote a new IRLS function that solves for the coefficients while keeping this constraint.

Is there any way to do something like this in STAN – either just set a linear constraint on a set of coefficients, or add a row of dummy predictors to the model matrix, and solve it all at the same time?

Sorry if the above was unclear, I can try to explain further if necessary.


#2

This seems like a weird restriction, but whatever.

In the transformed data block, center your x to get z but leave off the fifth element

transformed data {
  row_vector[4] z = [-2,-1,0,1];
}

In the parameters block, declare the free parameters

parameters {
  vector[4] g;
  ...
}

in the transformed parameters block, solve the normal equations such that g5 implies the slope is zero.

transformed parameters {
  real g5 = dot_product(z, g) * -0.5;
}

In the model block, stick g5 onto g

model {
  vector[5] beta;
  beta[1:4] = g;
  beta[5] = g5;
  // use beta in your likelihood
}

#3

Thanks for responding so clearly. This answer seems to have worked!

One follow-up (for now):

In the above example, there are 5 coefficients. If I want to change the function so that there are K coefficients (i.e. K as input data), is there a way I can rewrite the part that you put in the transformed data block to be dynamic?

I tried

int z0 = (K-1)/2;
row_vector[K-1] z = [(-z0):(z0-1)];

but that did not work – I received the error:

 error in 'model1e4c3bd969f0_loglinear_constr_w_prior' at line 14, column 22
  -------------------------------------------------
    12: transformed data { 
    13:   int z0 = (K-1)/2;
    14:   row_vector[K-1] z = [(-z0):(z0-1)];
                             ^
  -------------------------------------------------

PARSER EXPECTED: <expression>

Do you know if there is a way to do it dynamically? If not, I can think of a way, where I just create the vector in R, and then bring it in within data, but it would be more elegant to do it in Stan directly.

Thanks again for your help!


#4

@AJF, it really helps to have the whole error message. I’m guessing there’s a bit more after PARSER EXPECTED...?


#5
PARSER EXPECTED: <expression>
Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model 'loglinear_constr_w_prior' due to the above error.

Seems not extra informative to me…?


#6

Yup. No more informative info there. Often there is a Stan compiler message after that, which is why I asked. (On the forums, it’s good just to paste whole output for that reason; hard to tell what’s being censored and that often has important info. Here, not so much.)

Someone will need to dig in order to figure this one out.

@mitzimorris or @Bob_Carpenter, is this something we should create an issue for?


#7

@AJF, just looked at the thread again. Can’t really help without the model. It’s a syntax error, so gotta have something that we can run to help debug. If you post the full program, I’ll take a look.


#8

what the parser is saying is that that what you’ve got on the right-hand side isn’t an expression - what you’re trying to do is use Stan’s slice indexing operator :, but that’s not allowed here.

cf. Stan language manual, section 4.3 “Vector, Matrix, and Array Expressions” :

Vector Expressions
Square brackets may be wrapped around a sequence of comma separated primitive expressions to produce a row vector expression.

[edit: escaped code block]


#9

Thanks,@mitzimorris!


#10

It should have also given you a line number and pointer to the exact location in the source file where it was expecting to find an expression but instead found something else.

@mitzimorris’s follow-up post highlights this.