This is a general query to ask if there are examples of Stan models and code in which there is a negative intraclass correlation within multilevel clusters (?).

In our case, we might be examining a situation where a positive binary outcome for a unit within a cluster is accompanied by a drop in probability for all other units within that cluster. If necessary, we might try to bend discrete choice models to our purposes, but modeling a negative intraclass correlation would (in principle) be more straightforward.

A standard way in Stan to model correlations is to estimate the posterior of the correlation matrix of all the intraclass members. Stan has the built-in LKJ family of correlation matrix priors for this purpose.

(Aside – I am skipping any differences between continuous and non-continuous variables here; if looking at binary variables, I would normally logit[probit, etc] transform them. You may differ but that’s beside my point – I wish to only speak about the correlations themselves.)

This approach however does not restrict the individual correlation coefficients in any way, other than to ensure the entire matrix is positive definite. So if you wish for any pair of members to have the same correlation, this will not work. In this case, you can model the common correlation directly and construct the matrix manually. An important issue here however, is that there is a lower bound (greater than -1) for the common coefficient to ensure positive definiteness. So you would have to ensure this bound was respected. Further, this bound is a function of the size of the correlation matrix. So if you had different sized clusters, you’d have to keep track of the different bounds. (I believe it’s -1 / \sqrt{N-1} for an N by N matrix, but you can check this.)

Both of these approaches allow for negative correlations if the data suggest them, subject to positive definiteness. I suppose in the latter case, you could enforce negativity by your choice of prior, if that is what you wanted to do. I am not aware of, though, of any way to do this in the more unrestricted first case.

The different-sized clusters pose a challenge, admittedly.

This is a case where, if a positive intra-class correlation were anticipated, it would be straightforward to implement a basic multilevel model with random intercepts corresponding to the clusters. Heterogeneous cluster size is seldom a big challenge in that context.

In some ways, this problem is almost like a discrete choice model, but with added higher-level clustering. In that respect, it’s almost like expecting consumers to pick between lots of different Kirkland (Costco) or Market Pantry (Target) or Great Value (Wal-Mart) to understand the consumer’s latent affinity for one store or another. But once one product from a brand is chosen on a shopping trip, the probability of another product from that brand drops precipitously. And meanwhile, sometimes the consumer doesn’t get any product from the brand.

You can’t just rely on the theoretical lower bound, the requirements for pos-def change based on the other correlation values, in anything but very low dimensional cases you need to construct the matrix in a way that pos-def is assured. I don’t really understand the problem here though, what stops you simply estimating the correlation matrix? You need the correlation matrix to vary based on some covariate? If it’s just based on a binary outcome, then estimate two correlation matrices.

Also, be wary of the correlation matrix as a generative structure; it’s often treated as a universal correlation capture mechanism, but implies a very specific/limited generative process. I learned this the hard way. These days unless I have very strong prior information to support choice of a multivariate normal as a generative structure, I opt for designing relationships by hand using SEM.

Ok, here’s a stab at the data structure (the actual dataset has nothing to do with snacks, by the way):

Individual

Family

Product

Store

Y

1

A

GV raisins

Wal-Mart

1

1

A

GV peanuts

Wal-Mart

0

1

A

GV crackers

Wal-Mart

0

1

A

MP raisins

Target

0

1

A

MP peanuts

Target

1

1

A

Kirkland crackers

Costco

0

1

A

Kirkland peanuts

Costco

0

2

A

GV raisins

Wal-Mart

0

2

A

GV peanuts

Wal-Mart

0

2

A

GV crackers

Wal-Mart

0

2

A

MP raisins

Target

0

2

A

MP peanuts

Target

0

2

A

Kirkland crackers

Costco

1

2

A

Kirkland peanuts

Costco

1

3

B

GV raisins

Wal-Mart

0

3

B

GV peanuts

Wal-Mart

0

3

B

GV crackers

Wal-Mart

0

3

B

MP raisins

Target

0

3

B

MP peanuts

Target

0

3

B

Kirkland crackers

Costco

0

3

B

Kirkland peanuts

Costco

0

The goal of the analysis is to infer some latent propensity between families and stores based on the snacks that individual members of the families acquire. In general, we observe that when individuals have purchased one type of snack from a store, the odds of purchasing any other snack from that store drop precipitously (but this is not a hard constraint, as seen with individual 2, who purchases two types of snacks from Costco).

And then there are some consumers who don’t buy any snacks, such as person 3.

One option is to marginalize over the problem by dichotomizing within individual-store combinations – that is, did individual i purchase any snacks of any kind from store j? But that potentially discards useful information (as in the case of individual 2) while also precluding possibilities for modeling covariates that are specific to the person-product combinations.

Hence the idea of modeling the negative intra-class correlation in the person-store clusters.

The heterogeneous size of the clusters made it difficult to discern how the effects could be assigned to a covariance matrix.