Distance matrix regression

Zacco · July 21, 2018, 4:44am

I am curious how one might set up a way to use matrices (of the same dimension) as response and predictor variables in brms.

I’ve attempted to do this by converting each matrix into a vector, but using row and column position in the original matrix as random effects. This works well for non-symmetric matrices, but I’m not sure how one would handle syntax when the matrix is symmetric, like a distance matrix.

paul.buerkner · July 21, 2018, 6:16am

On a technical level, you can use matrices as elements of a data.frame and then pass to brms:

df <- data.frame(y = rnorm(100))
df$A <- matrix(rnorm(300), ncol = 3)
fit <- brm(y ~ A, data = df)

but I got the feeling this is not what you have in mind. In this case, could you try to precise your question?

Zacco · July 21, 2018, 9:20am

Yes, the issue is not the technical level of inputing a matrix, it is ensuring that the structure of the distance matrix is accounted for in the model.

Here is a toy example with the iris dataset:

data(iris)
#create distance matrix from first two principal components
iris.dist<-as.matrix(dist(prcomp(iris[,1:2],scale=T)$x[,1:2]))
#scale to mean 0, SD 1
iris.dist<-iris.dist-mean(iris.dist)
iris.dist<-iris.dist/sd(iris.dist)

iris.dat <- data.frame(matrix(ncol = 3, nrow = 0))
colnames(iris.dat) <- c("Distance","RowID", "ColumnID", "SameSpecies")
for(i in 1:nrow(iris)) {
  for(j in 1:nrow(iris)) {
    if(i!=j) { #do not include distance if same individual is chosen
      iris.dat<-rbind(iris.dat,data.frame(
        Distance=iris.dist[i,j], # distance
        RowID=as.character(i), # row ID
        ColumnID=as.character(j), # column ID
        SameSpecies=as.numeric(iris$Species[i]==iris$Species[j])
        #are the two individuals of the same species?
      ))
    }
  }
}

# do individuals of the same species have a smaller Euclidean distance?
brm.iris<-brm(Distance~SameSpecies+(1|RowID)+(1|ColumnID),
                    family=gaussian,
                    data=iris.dat,cores=4, inits = 0,
                    prior=c(prior("normal(0,2)", "b")))
summary(brm.iris)

The problem here is that each distance appears twice (ie. 2,5 and 5,2), but the RowID and ColumnID variables do not fully capture the dependancies between variables without this repetition.

Bob_Carpenter · July 21, 2018, 10:03am

A distance matrix is also going to

be symmetric,
have non-negative entries,
have zero diagonals, and
satisfy the triangle inequality, d(a,b) + d(b,c) \geq d(a, c).

I’m not exactly sure what you mean by taking that structure into account. It looks like in your model that the distace is the observation in a regression, so presumably you’re looking for some structure in those random effects so that you get the same result by switching the row and column IDs. Given the way you set up the regression as SameSpecies + (1 | RowID) + (1 | ColID), if you swap row and column IDs, you’ll get the same prediction, becuase the SameSpecies will be the same.

paul.buerkner · July 21, 2018, 10:13am

I suggest using only the half of the distance matrix (lower the diagonal, say) and then use a multimembership grouping term to account for the fact that rows and Columns refer to the same set of locations:

Distance ~ SameSpecies + (1 | mm(RowID, ColumnID))

See also https://journal.r-project.org/archive/2018/RJ-2018-017/index.html for more details about multimembership terms.

Bob_Carpenter · July 21, 2018, 10:46am

That multimembership thing is neat!

I don’t think we’re quite there, yet though. Won’t this formula still wind up counting Distance[i, j] and Distance[j, i] as two observations rather than one, and counting Distance[i, i] as an observation even though we know structurally the result has to be zero in that case?

Zacco · July 21, 2018, 10:47am

A multi-membership grouping term would indeed solve the problem! Perfect, thank you Paul!

This will let me account for the fact that for a hypothetical set of three points X, Y, and Z, the distance between point X and Y is not completely independent of the distance between X and Z or the distance between Y and Z.

By using the multi-member grouping term, I think STAN/brms could be a potential alternative to the exponential random graph models common in social network analysis, in addition to performing multivariate distance matrix regression (MDMR).

paul.buerkner · July 21, 2018, 10:50am

@Bob_Carpenter That’s why I am saying one needs to use the lower triangular part of the matrix, only.

@Zacco Glad to hear multi-membership are helpful to you :-)

Zacco · July 21, 2018, 10:52am

I think I can avoid that by removing the duplicated terms. The code would now look like this:

data(iris)
#create distance matrix from first two principal components
iris.dist<-as.matrix(dist(prcomp(iris[,1:2],scale=T)$x[,1:2]))
#scale to mean 0, SD 1
iris.dist<-iris.dist-mean(iris.dist)
iris.dist<-iris.dist/sd(iris.dist)

iris.dat ← data.frame(matrix(ncol = 3, nrow = 0))
colnames(iris.dat) ← c(“Distance”,“RowID”, “ColumnID”, “SameSpecies”)
for(i in 1:nrow(iris)) {
for(j in 1:nrow(iris)) {
if(i<j) { #only include the lower diagonal of the distance matrix
iris.dat<-rbind(iris.dat,data.frame(
Distance=iris.dist[i,j], # distance
RowID=as.character(i), # row ID
ColumnID=as.character(j), # column ID
SameSpecies=as.numeric(iris$Species[i]==iris$Species[j])
#are the two individuals of the same species?
))
}
}
}

brm.iris<-brm(Distance~SameSpecies+(1|mm(RowID,ColumnID)),
family=gaussian,
data=iris.dat,cores=4, inits = 0,
prior=c(prior(“normal(0,2)”, “b”)))
summary(brm.iris)

Bob_Carpenter · July 21, 2018, 10:59am

Thanks for being patient—I mised that. @Zacco’s example cleared up what you meant.

@Zacco, feel free to mark your last post or @paul.buerkner’s as a solution.

Topic		Replies	Views
How to input matrix data into brms formula (for signal regression using smooths)? Modeling rstan , matrix , r , brms	5	1112	June 29, 2021
Incorporating distance matrices into Gaussian Process in brms brms gaussian-process	11	2850	September 17, 2020
Multi-membership random effects model for distance matrices brms	14	122	January 13, 2025
Brms: multivariate regression with response variables of different length brms	1	552	May 28, 2021
Incorporating a species occurence matrix as predictor in brms brms	13	1091	June 24, 2021

Distance matrix regression

Related topics