Hi there. So I have a bunch of comments from a forum that all relate to a certain topic. I want to see if there was a noticeable difference in the forum users’ attitude towards that topic after a certain date. To estimate that, I’ve done sentiment analysis on the text content (sentiment
) & I also have a scaled upvote/downvote score (score_combined
).
Now, I’m assuming that users’ attitude both affect the sentiment of comments positively & the upvote/downvote score of comments based on the sentiment (so, if attitudes are positive, negative comments are downvoted & positive comments are upvoted). So I have two outcome variables here, one of which affects the other one.
I’m inexperienced w statistical modelling, so here was my initial attempt to model this:
data{
vector[656] score_combined;
vector[656] sentiment;
int is_after_date[656];
}
parameters{
real a1;
real a2;
vector[2] b;
real c;
real<lower=0> sigma1;
real<lower=0> sigma2;
}
model{
vector[656] mu1;
vector[656] mu2;
sigma2 ~ exponential( 1 );
sigma1 ~ exponential( 1 );
c ~ normal( 0 , 100 );
b ~ normal( 0 , 10 );
a2 ~ normal( 0 , 100 );
a1 ~ normal( 0 , 100 );
for ( i in 1:656 ) {
mu2[i] = a2 + b[is_after_date[[i]] * c;
}
sentiment ~ normal( mu2 , sigma2 );
for ( i in 1:656 ) {
mu1[i] = a1 + b[is_after_date[[i]] * sentiment[i];
}
score_combined ~ normal( mu1 , sigma1 );
}
So my thinking here is that b
kind of maps users’ attitude towards the topic, & that parameter influences both comment sentiment & upvote/downvote score of comments based on their sentiment. Is this at all about the right way of thinking on this sort of problem? Am I in the right neighbourhood? Would of course appreciate any help.