Hi Martin,
thanks for the response :-)
So what we are trying to do is the following: For our protein of interest, we have multiple sequences of the same protein. We would like to find out the whether two positions are related to each other (hence interacting with each other) by checking how often they co-vary among all the sequences.
I tried to simulate the problem. In the following code I assume that category 1 and 4 co-vary together.
# simulate observation of 500 category pairs
N <- 500
prob_cat_a <- c(1,1,2,1)
pa <- softmax(prob_cat_a[1], prob_cat_a[2], prob_cat_a[3], prob_cat_a[4])
cat_pos_a <- rep(NA,N)
set.seed(34302)
for ( i in 1:N ) cat_pos_a[i] <- sample( 1:4 , size = 1, prob=pa )
cat_pos_b <- rep(NA,N)
for ( i in 1:N ) {
if (cat_pos_a[i] == 1 ) {
prob_cat_b <- c(1,1,1,5)
}
else if ( cat_pos_a[i] == 4 ) {
prob_cat_b <- c(5,1,1,1)
}
else {
prob_cat_b <- c(1,1,1,1)
}
pb <- softmax(prob_cat_b[1], prob_cat_b[2], prob_cat_b[3], prob_cat_b[4])
cat_pos_b[i] <- sample( 1:4 , size = 1, prob=pb )
}
I thought I start with indicator variables for all 4 categories, because thats easier for me to understand. Once I got the model running, I want to switch to index variables to make it more flexible. Anyway, the model is not running, here is what I did so far:
code <- "
data{
int N; // number of pairs
int K; // number of possible categories
int cat_pos_a[N]; // outcome
int posb_1[N]; // predictor indicator for class 1
int posb_2[N]; // predictor indicator for class 2
int posb_3[N]; // predictor indicator for class 3
int posb_4[N]; // predictor indicator for class 4
}
parameters {
vector[K-1] a; // intercepts
vector[K-1] b1;
vector[K-1] b2;
vector[K-1] b3;
vector[K-1] b4;
}
model {
vector[K] p;
vector[K] s;
a ~ normal(0, 1);
b1 ~ normal(0, 5);
b2 ~ normal(0, 5);
b3 ~ normal(0, 5);
b4 ~ normal(0, 5);
s[1] = a[1] + b1*posb_1 + b2*posb_2 + b3*posb_3 + b4*posb_4;
s[2] = a[2] + b1*posb_1 + b2*posb_2 + b3*posb_3 + b4*posb_4;
s[3] = a[3] + b1*posb_1 + b2*posb_2 + b3*posb_3 + b4*posb_4;
s[4] = 0; // Pivot
p = softmax( s );
cat_pos_a ~ categorical( p );
}
"
"
This is how I run it:
dat_list <- list(N=N,
K=4,
cat_pos_a=cat_pos_a,
posb_1=ifelse ( cat_pos_b==1, 1, 0),
posb_2=ifelse ( cat_pos_b==2, 1, 0),
posb_3=ifelse ( cat_pos_b==3, 1, 0),
posb_4=ifelse ( cat_pos_b==4, 1, 0)
)
m <- stan(model_code=code, data=dat_list, chains=4)
And this is what I get:
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
No matches for:
vector * int[]
Available argument signatures for operator*:
real * real
vector * real
row vector * real
matrix * real
row vector * vector
vector * row vector
matrix * vector
row vector * matrix
matrix * matrix
real * vector
real * row vector
real * matrix
No matches for:
real + ill formed
Available argument signatures for operator+:
int + int
real + real
vector + vector
row vector + row vector
matrix + matrix
vector + real
row vector + real
matrix + real
real + vector
real + row vector
real + matrix
+int
+real
+vector
+row vector
+matrix
No matches for:
vector * int[]
Available argument signatures for operator*:
real * real
vector * real
row vector * real
matrix * real
row vector * vector
vector * row vector
matrix * vector
row vector * matrix
matrix * matrix
real * vector
real * row vector
real * matrix
No matches for:
ill formed + ill formed
Available argument signatures for operator+:
int + int
real + real
vector + vector
row vector + row vector
matrix + matrix
vector + real
row vector + real
matrix + real
real + vector
real + row vector
real + matrix
+int
+real
+vector
+row vector
+matrix
No matches for:
vector * int[]
Available argument signatures for operator*:
real * real
vector * real
row vector * real
matrix * real
row vector * vector
vector * row vector
matrix * vector
row vector * matrix
matrix * matrix
real * vector
real * row vector
real * matrix
No matches for:
ill formed + ill formed
Available argument signatures for operator+:
int + int
real + real
vector + vector
row vector + row vector
matrix + matrix
vector + real
row vector + real
matrix + real
real + vector
real + row vector
real + matrix
+int
+real
+vector
+row vector
+matrix
No matches for:
vector * int[]
Available argument signatures for operator*:
real * real
vector * real
row vector * real
matrix * real
row vector * vector
vector * row vector
matrix * vector
row vector * matrix
matrix * matrix
real * vector
real * row vector
real * matrix
No matches for:
ill formed + ill formed
Available argument signatures for operator+:
int + int
real + real
vector + vector
row vector + row vector
matrix + matrix
vector + real
row vector + real
matrix + real
real + vector
real + row vector
real + matrix
+int
+real
+vector
+row vector
+matrix
Expression is ill formed
error in 'model1bc53051cc07_b9dbde3639292f53cacf9bde5d2a0691' at line 30, column 63
-------------------------------------------------
28: b4 ~ normal(0, 5);
29:
30: s[1] = a[1] + b1*posb_1 + b2*posb_2 + b3*posb_3 + b4*posb_4;
^
31: s[2] = a[2] + b1*posb_1 + b2*posb_2 + b3*posb_3 + b4*posb_4;
-------------------------------------------------
I’m looking forward to your comments. It seems that I really doing something wrong syntax wise.
As stan novice I really appreciate your willingness to help me :-)
Best,
Thorsten