Basic Data Parameterization Issue


I’m trying to fit a model that predicts whether an individual requires help to care for themselves in a longitudinal survey (binary outcome, repeated measure) based on a series of characteristics measured at baseline and others that are measured repeatedly. Our particular interest is in how 3 of those characteristics (all binary) interact — preferences, expression and race. I’m struggling with how to setup the data. Given that all of the data are ints, I initially set things up as an int arrays (2 dimensional for variables that change over time and 1 dimensional for variables that are only measured at baseline) and iterated across all patients (N) and survey wave (M) in my model. That worked when I put preferences, expression and race in as fixed effects in a logit model. When I’ve tried to put them in as partially pooled parameters, I’m struggling to figure out how to parameterize the data/specify the model. Now, when I loop I’m getting syntax errors:

  1. No matches for:

real[] * real

expression is ill formed
error in ‘unkown file name’ at line 81, column 102
79: m*waveCoeff +
80: patientIntercept[n] +
81: alphaPreference[preference] * prefSD ) ;
82: }

I’m new to Stan and Bayesian modeling — this is my first attempt to fit something appproximating a model that I’d actually care about. Its pretty clear that I don’t really understand what I’m looping over or what the indices are referring to… It seems to me that there shouldn’t be a place where i’m multiplying a real[] by a real, unless I don’t understand what the data-baed indices (preference, expression , race) are doing and/or they’re not working the way I intuitively think they should be working.

I’m quite sure that there is a vector-based way that I could parameterize the model, but in order to make sure that I understood exactly what was happening, I was hoping to do this with loops…

Model is

data	{
	int<lower=0> N;	//number of individuals
	int<lower=0> M; //waves of data
	int<lower=1, upper=2> preference[N];		//1 = non=aggressive, 2 = aggressive
	int<lower=1, upper=2> expression[N];		//1 = no expression, 2 = expression
    int<lower=1, upper=2> race[N];				//1 = white, 2 = black
	int<lower=1, upper=8> prefExpressionRaceCat[N];	//3 way interaction...index 1 to 8 for each category
    int<lower=1, upper=6> baselineAgeCat[N];

    int<lower=0> physicalCapacity[N,M];
    int<lower=0> clockDraw[N,M];
    int<lower=0> memory[N,M];
    int<lower=0> selfCareLimitationCount[N,M];
    int<lower=0> anySelfCareActivityLimitation[N,M];
	int<lower=0> dead[N,M];    

parameters	{
	//intercepts for preference, expression, race and 3-way interaction groups
	real alphaPreference[2];
	real alphaExpression[2];
	real alphaRace[2];
	real alphaPrefExpressionRaceInteraction[8];	//3 way interaction between preferences, expression and race.	
	real meanPatientIntercept;
	real<lower=0> sdPatientIntercept;
	real<lower=0> prefSD;
	real<lower=0> expressionSD;
	real<lower=0> raceSD;
	real<lower=0> prefExpressionRaceSD;
	real physicalCapacityCoeff;
	real clockDrawCoeff;
	real memoryCoeff;
	real ageCoeff;
	real waveCoeff;
	real patientIntercept[N];
	real alpha;

	//fixed effects
	physicalCapacityCoeff ~ normal(0,1);
	clockDrawCoeff ~ normal(0, 1);
	memoryCoeff ~ normal(0, 1);
	ageCoeff ~ normal(0,1);		//can probably put a non-centered prior on this as well
	waveCoeff ~ normal(0, 1);		//can probably put a non-normal prior on this...disabiltiy has to go up with waves
	alpha ~ normal(0, 1);
	//partially pooled effects
	alphaPreference ~ normal(0, 1);
	alphaExpression ~ normal(0, 1);
	alphaRace ~ normal(0, 1);
	alphaPrefExpressionRaceInteraction ~ normal(0, 1);
	//description of partially pooled effect sizes
	prefSD ~ normal(0, 1);
	expressionSD ~ normal(0, 1);
	raceSD ~ normal(0, 1);
	prefExpressionRaceSD ~ normal(0, 1);
	//random patient-level intercept
	sdPatientIntercept ~ normal(0, 1);
	meanPatientIntercept ~ normal(0, 1);
	patientIntercept ~ normal(meanPatientIntercept, sdPatientIntercept);

	for (n in 1:N)	{
		for (m in 1:M)	{		//note to self: come back and try to vectorize this...
			anySelfCareActivityLimitation[n,m] ~ bernoulli_logit(alpha +
																physicalCapacityCoeff *physicalCapacity[n,m] +
																clockDrawCoeff * clockDraw[n,m] + 
																memoryCoeff * memory[n,m] + 
																ageCoeff * baselineAgeCat[n] + 
																m*waveCoeff + 
																patientIntercept[n]  +
																alphaPreference[preference] *  prefSD  +
																alphaExpression[expression] * expressionSD + 
																alphaRace[race] * raceSD + 
																alphaPrefExpressionRaceInteraction[prefExpressionRaceCat] * prefExpressionRaceSD )  ;

generated quantities   {
    int y_ppc[N,M];
    real phat_ppc = 0;

    for (n in 1:N)  {
    	for (m in 1:M)	{
        	y_ppc[n,m] = bernoulli_logit_rng(alpha +
																physicalCapacityCoeff *physicalCapacity[n,m] +
																clockDrawCoeff * clockDraw[n,m] + 
																memoryCoeff * memory[n,m] + 
																ageCoeff * baselineAgeCat[n] + 
																m*waveCoeff + 
																patientIntercept[n] +
																alphaPreference[preference] *  prefSD  +
																alphaExpression[expression] * expressionSD + 
																alphaRace[race] * raceSD + 
																alphaPrefExpressionRaceInteraction[prefExpressionRaceCat] *prefExpressionRaceSD )  ;
        	phat_ppc = phat_ppc + y_ppc[n,m];
    phat_ppc = phat_ppc / (N*M);

It is more that real arrays do not work like you (and lots of other people) think they should be working. You cannot do element-wise math with them like you can in R and other languages. A solution to your problem is to use a vector rather than a one-dimensional real array. Thus, instead of doing

real alphaPreference[2];


vector[2] alphaPreference

and so forth for the other real arrays in the parameters block.

Your Stan program is otherwise not that bad, but it is difficult to jump into Stan and immediately do some hierarchical longitudinal thing. I would either start with the stan_glmer function in the rstanarm R package, the brm function in the brms package, or a flat logit model in the Stan language.

first, many thanks — that worked like a champ. can you, or anybody, help me understand why real arrays and vectors are behave that way? going through the manual, my gloss was that they’re the same (but, stored differently) and that the main virtue of vectors was that Stan does a lot of smart linear algebra with them…this seems to speak to a more fundamental different in how Stan thinks about them…

second, fear not…i’ve been playing with variants on this model on and off for a week. i stated with simple logit models and have been slowly building up to add complexity. this was the first time i hit a snag where i couldn’t figure out how to unsnag myself.

A scalar times a vector yielding a vector is considered to be a valid linear algebra operation. Moreover, scalar-vector multiplication is supported by Eigen, which is what is being utilized if you use a vector, row_vector, matrix or specialization thereof in Stan. The standard C++ library is what is utilized if you use a real array, which does not define math operations like multiplication by a scalar.

that makes perfect sense. many thanks…a little insight into whats happening under the hood is quite helpful…