I am curious how one might set up a way to use matrices (of the same dimension) as response and predictor variables in brms.
I’ve attempted to do this by converting each matrix into a vector, but using row and column position in the original matrix as random effects. This works well for non-symmetric matrices, but I’m not sure how one would handle syntax when the matrix is symmetric, like a distance matrix.
Yes, the issue is not the technical level of inputing a matrix, it is ensuring that the structure of the distance matrix is accounted for in the model.
Here is a toy example with the iris dataset:
data(iris)
#create distance matrix from first two principal components
iris.dist<-as.matrix(dist(prcomp(iris[,1:2],scale=T)$x[,1:2]))
#scale to mean 0, SD 1
iris.dist<-iris.dist-mean(iris.dist)
iris.dist<-iris.dist/sd(iris.dist)
iris.dat <- data.frame(matrix(ncol = 3, nrow = 0))
colnames(iris.dat) <- c("Distance","RowID", "ColumnID", "SameSpecies")
for(i in 1:nrow(iris)) {
for(j in 1:nrow(iris)) {
if(i!=j) { #do not include distance if same individual is chosen
iris.dat<-rbind(iris.dat,data.frame(
Distance=iris.dist[i,j], # distance
RowID=as.character(i), # row ID
ColumnID=as.character(j), # column ID
SameSpecies=as.numeric(iris$Species[i]==iris$Species[j])
#are the two individuals of the same species?
))
}
}
}
# do individuals of the same species have a smaller Euclidean distance?
brm.iris<-brm(Distance~SameSpecies+(1|RowID)+(1|ColumnID),
family=gaussian,
data=iris.dat,cores=4, inits = 0,
prior=c(prior("normal(0,2)", "b")))
summary(brm.iris)
The problem here is that each distance appears twice (ie. 2,5 and 5,2), but the RowID and ColumnID variables do not fully capture the dependancies between variables without this repetition.
satisfy the triangle inequality, d(a,b) + d(b,c) \geq d(a, c).
I’m not exactly sure what you mean by taking that structure into account. It looks like in your model that the distace is the observation in a regression, so presumably you’re looking for some structure in those random effects so that you get the same result by switching the row and column IDs. Given the way you set up the regression as SameSpecies + (1 | RowID) + (1 | ColID), if you swap row and column IDs, you’ll get the same prediction, becuase the SameSpecies will be the same.
I suggest using only the half of the distance matrix (lower the diagonal, say) and then use a multimembership grouping term to account for the fact that rows and Columns refer to the same set of locations:
I don’t think we’re quite there, yet though. Won’t this formula still wind up counting Distance[i, j] and Distance[j, i] as two observations rather than one, and counting Distance[i, i] as an observation even though we know structurally the result has to be zero in that case?
A multi-membership grouping term would indeed solve the problem! Perfect, thank you Paul!
This will let me account for the fact that for a hypothetical set of three points X, Y, and Z, the distance between point X and Y is not completely independent of the distance between X and Z or the distance between Y and Z.
By using the multi-member grouping term, I think STAN/brms could be a potential alternative to the exponential random graph models common in social network analysis, in addition to performing multivariate distance matrix regression (MDMR).
I think I can avoid that by removing the duplicated terms. The code would now look like this:
data(iris) #create distance matrix from first two principal components
iris.dist<-as.matrix(dist(prcomp(iris[,1:2],scale=T)$x[,1:2])) #scale to mean 0, SD 1
iris.dist<-iris.dist-mean(iris.dist)
iris.dist<-iris.dist/sd(iris.dist)
iris.dat ← data.frame(matrix(ncol = 3, nrow = 0))
colnames(iris.dat) ← c(“Distance”,“RowID”, “ColumnID”, “SameSpecies”)
for(i in 1:nrow(iris)) {
for(j in 1:nrow(iris)) {
if(i<j) { #only include the lower diagonal of the distance matrix
iris.dat<-rbind(iris.dat,data.frame(
Distance=iris.dist[i,j], # distance
RowID=as.character(i), # row ID
ColumnID=as.character(j), # column ID
SameSpecies=as.numeric(iris$Species[i]==iris$Species[j]) #are the two individuals of the same species?
))
}
}
}