Error in compiling partially-hierarchical multinomial logistic regression

Hi all, I’m having a persistent error when trying to specify this model, and the line-numbers aren’t very helpful. Would anyone be willing to point me towards a solution?

Here’s my code:

data {
  int<lower=1> N;  // number of tableaux -- that is to say, number of datapoints
  int<lower=0> max_num_candidates;
  array[N] int<lower=2, upper=max_num_candidates> J; // vector of num_candidate in each tableau
  int<lower=1> K; //  number of constraints; this is constant across tableaux
  array[K] int<lower=0, upper=1> is_constraint_hierarhical;
  int<lower=1> L; // number of levels of the random effect
  array[N] int<lower=1, upper=L> ll; // an array telling you which level of the random effect a given UR belongs to.
  array[N] int<lower=1,upper=max_num_candidates> Y; // index of winning candidate in each tableau
  array[N] matrix[max_num_candidates, K] violations; // violations
} parameters {
  array[K] real mu; // the vector of constraint weights, coefficients - group level
  array[K] real<lower=0> sigma; // the variance on these guys
  array[L] vector[K] beta; // a vector of random-effect-level coefficients that are drawn from real_mu, one for each leve lof the random effect. these are the random-effect-level offsets // means drawn from the group-level real mu
  //matrix[L, K] beta;
}



model {
  for (constraint_index in 1:K) { // for each constraint -
  	mu[constraint_index] ~ normal(0, 100); // draw its mean from a hyperprior distribution
	  if (is_constraint_hierarhical[constraint_index] == 1) { // then do the thing where you draw a group-level random offset and put it in the relevant slot in the item-by-constraint coeff matrix
      for (l in 1:L) { // then for each level of the random effect
        beta[l, constraint_index] ~ normal(mu[constraint_index], sigma[constraint_index]); // draw the value of the level-specific offset for that constraint from a distribution centered on for each offset from the group-level for that constraint
                      }
	  }
  	else { // otherwise just skip one level of hierarchy and put the mu - the group-level - in.
  	  for (l in 1:L) { // then for each level of the random effect
        beta[l, constraint_index] =  mu[constraint_index]; // just slot in the group-level mu.
          	          }
          }

}
//print(size(beta));
for (i in 1:N) { // then for each datapoint
  //relevant_intercept_index = ll[n];
    Y[i] ~ categorical_logit(block(violations[i], 1, 1, J[i], K)* beta[ll[i]]); // take the datapoint n, get the random effect group it belongs to - ll - then index into the matrix of betas to find that row, and multiply it by the constraint violation matrix - this is what needs to be blocked on my end - for that one.
  }
}


generated quantities {
  array[N] int sim_winners;
  for (i in 1:N){
    sim_winners[i] = categorical_logit_rng(block(violations[i], 1, 1, J[i], K)*beta[ll[i]]);
  }
}

And I’m getting the error:

Semantic error in 'string', line 27, column 8 to column 58:

Cannot assign to global variable 'beta' declared in previous blocks.

The model compiles and fits fine when K is non-hierarchically-structured:

data {
  int<lower=1> N;  // number of tableaux -- that is to say, number of datapoints
  int<lower=0> max_num_candidates;
  array[N] int<lower=2, upper=max_num_candidates> J; // vector of num_candidate in each tableau
  int<lower=1> K; //  number of constraints; this is constant across tableaux
  array[N] int<lower=1,upper=max_num_candidates> Y; // index of winning candidate in each tableau
  array[N] matrix[max_num_candidates, K] violations; // violations
}

parameters {
  vector<lower=0>[K] beta;  // attribute effects  // constraint weights; should be just one
}

model {
  beta ~ normal(0,1);
  for (i in 1:N) // look up "block" - just looks at a specific part of a matrix
    //local_violations = ; // this is supposed to get the i'th tableau's candidates and violations from the stack violations (which is num_tableaux * max_num_candidates (padded out) * num_constraints in size), and pull out only the first J[i] rows, which are just those that are nonzero and not padded out so relevant here, and then all columns (the K constraints, which are constant for all tableaux)
    Y[i] ~ categorical_logit(block(violations[i], 1, 1, J[i], K)*beta); // so this gets just one instance, one token, at a time. could we ramp up to do multinomial and not categorical?
}

generated quantities {
  array[N] int sim_winners;
  for (i in 1:N)
    sim_winners[i] = categorical_logit_rng(block(violations[i], 1, 1, J[i], K)*beta);
}

So I assume something odd is going on in the way I tried to enrich it with hierarchical structure.

Thanks very much, I appreciate any help I can get in advance!

EDIT: I realize the error I originally had was to do with an extra parenthesis (deep sigh…), but after that I solved a few more random issues, and am still stuck on the one above; I think it has to do with mixing ~ and = when populating the beta matrix, but I don’t really know another way around it. any suggestions would be very welcome!

EDIT: I found I could get rid of the error by replacing the beta[l, constraint_index] = mu[constraint_index] with something hacky like beta[l, constraint_index] ~ normal(mu[constraint_index], 0.00000001) but that doesn’t really feel like a good solution; so I’d still really appreciate any help with this folks might have!

The key thing here is that you shouldn’t be trying to assign values to a parameter (i.e., beta[l, constraint_index] = mu[constraint_index]), instead you need to sample only the values of beta that need to be parameters, and then construct the beta matrix/array in the transformed parameters or model blocks from the beta parameters and mu parameters.

Thanks so much for your response; I’m not exactly sure how to do this, would you mind giving an example? One thought I have might be to do something like this:

model {
  for (constraint_index in 1:K) { 
	  if (is_constraint_hierarhical[constraint_index] == 1) { 
  	    mu[constraint_index] ~ normal(0, 100); 

        for (l in 1:L) { 

          beta[l, constraint_index] ~ normal(mu[constraint_index], sigma[constraint_index]); 
                      }
	  }
  	else { 
  	  for (l in 1:L) { 
        beta[l, constraint_index] ~ normal(0, 100); 
          	          }
          }

}
for (i in 1:N) { // then for each datapoint
    Y[i] ~ categorical_logit(block(violations[i], 1, 1, J[i], K)* beta[ll[i]]);
  }
}

But wouldn’t this miss out on the intended structure that for a given constraint k, you want to either sample the mu with noise or without, depending on the item, rather than having the mu without noise be unrelated to the one with noise?

Thanks again for your help, I really appreciate it.

The end goal is that you only want a subset of beta to be parameters that are distributed normal(mu[constraint_index], sigma[constraint_index]), with the rest of the values being mu[constraint_index], if I’m reading your model right.

So you only declare the subset of values of beta that need to be parameters as parameters, then you combine those parameters with the appropriate mu values to construct your full beta.

To use a simpler example, the below model has a single vector beta of length K:

data {
  int K;
  array[K] int hier_constraint;
}

transformed data {
  int num_hier = sum(hier_constraint);
}

parameters {
  vector[num_hier] beta_params; // These are given the appropriate normal(mu[k], sigma[k]) priors
  vector[K] mu;
  vector<lower=0>[K] sigma;
}

transformed parameters {
  vector[K] beta;
 {
    // local variable to index which parameter value to use
    int param_ind = 1; 
    for (k in 1:K) {
      if (hier_constraint[k] == 1) {
        beta[k] = beta_params[param_ind];
        param_ind += 1;
      } else {
        beta[k] = mu[k];
      }
    }
  }
}
1 Like

Hi @andrjohns, thanks a lot for this; I really appreciate your help, and think I understand most of what you suggested. I implemented it, trying to interleave it with the multiple levels of hierarchy that I want, and it went fine, but I don’t think it implements anywhere your comment that the beta_params have have a mean drawn from normal(mu[k], sigma[k]). I tried to do this in the third line of the model block,
beta_params ~ normal(mu,sigma); , but this gives the error Semantic error in 'string', line 42, column 0 to column 31: Ill-typed arguments to '~' statement. No distribution 'normal' was found with the correct signature. which I don’t really understand; isn’t this where I specify the priors on the beta_params? My model is syntactically correct if I comment out the line third line of the model block (starting beta_params… etc.) but I don’t think this model does what I intend.

The model that throws the error above is:

data {
  int<lower=1> N;  // number of tableaux -- that is to say, number of datapoints
  int<lower=0> max_num_candidates;
  array[N] int<lower=2, upper=max_num_candidates> J; // vector of num_candidate in each tableau
  int<lower=1> K; //  number of constraints; this is constant across tableaux
  array[K] int<lower=0, upper=1> is_constraint_hierarhical_by_items;
  int<lower=1> L; // number of levels of the random effect
  array[N] int<lower=1, upper=L> ll; // an array telling you which level of the random effect a given UR belongs to.
  array[N] int<lower=1,upper=max_num_candidates> Y; // index of winning candidate in each tableau
  array[N] matrix[max_num_candidates, K] violations; // violations
}


transformed data {
  int num_constraints_hierarchical_on_items = sum(is_constraint_hierarhical_by_items);
}


parameters {
  array[K] real mu; // the vector of constraint weights, coefficients - group level
  array[K] real<lower=0> sigma; // the variance on these guys
  array[L] vector[num_constraints_hierarchical_on_items] beta_params; // a vector of random-effect-level coefficients that are drawn from real_mu, one for each leve lof the random effect, for only those constraints that are hierarchical. these are the random-effect-level offsets // means drawn from the group-level real mu
  //matrix[L, K] beta;
}


transformed parameters {
  array[L] vector[K] beta;
{
    // local variable to index which parameter value to use
    int param_ind = 1;
    for (constraint_index in 1:K) {
      if (is_constraint_hierarhical_by_items[constraint_index] == 1) {
        for (level_of_item_hierarchical_structure in 1:L) {

        beta[level_of_item_hierarchical_structure,constraint_index] = beta_params[level_of_item_hierarchical_structure,param_ind];
        param_ind += 1;
      }} else {
                for (level_of_item_hierarchical_structure in 1:L) {

        beta[level_of_item_hierarchical_structure,constraint_index] = mu[constraint_index];
                }
      }
    }
  }
}




model {
mu ~ normal(0,10); // prior on mean of each mu
sigma ~ normal(0,1); // prior on sigma for that beta_param
beta_params ~ normal(mu,1); // this is the variance on how much individual random offsets can be from the mu
//print(size(beta));
for (i in 1:N) { // then for each datapoint
    Y[i] ~ categorical_logit(block(violations[i], 1, 1, J[i], K)* beta[ll[i]]); // take the datapoint n, get the random effect group it belongs to - ll - then index into the matrix of betas to find that row, and multiply it by the constraint violation matrix - this is what needs to be blocked on my end - for that one.
  }
}


generated quantities {
  array[N] int sim_winners;
  for (i in 1:N){
    sim_winners[i] = categorical_logit_rng(block(violations[i], 1, 1, J[i], K)*beta[ll[i]]);
  }
}


would you mind pointing out where I’ve gone wrong here?

Looking at the parameters/statements of interest here, you’ve got:

parameters {
  array[K] real mu;
  array[L] vector[num_constraints_hierarchical_on_items] beta_params; 
}

model {
  mu ~ normal(0,10);
  beta_params ~ normal(mu,1);
}

Your parser error is happening because you’re using an array real (mu) as the location parameter for an array vector (beta_params) which isn’t valid. But that’s not really the main issue here.

The key component that you’re missing is something that tells Stan which element of mu should be used as the location parameter for which element of a given beta_params[l] - as not all elements of mu are used for the prior (only the ones corresponding to the constraint).

This means that you need to update the sampling statement for beta_params to only be using the values of mu which correspond to a hierarchical constraint.

One approach for this would be:

model {
  mu ~ normal(0,10); // prior on mean of each mu
  sigma ~ normal(0,1); // prior on sigma for that beta_param

  for (l in 1:L) {
    // local variable to index which parameter value to use
    int param_ind = 1;
    for (k in 1:K) {
      // Extract the appropriate value of mu for specifying the 
      // prior for beta_params elements
      if (is_constraint_hierarhical_by_items[k] == 1) {
        beta_params[l, param_ind] ~ normal(mu[k], 1);
        param_ind += 1;
      }
    }
  }
}

EDIT: Updated the placement of param_ind

1 Like

Hi @andrjohns , I realized what went wrong, and fixed it. Thank you so much for your help with this!