Hi all,
I am trying to implement a model for detecting “influence” in word usage in House of Commons debates. The idea is to model the words of a given speech as a mixture of a) that speech’s own word distribution, b) the word distribution of all previous speeches in a given debate.
The outcome W[v,d]
is the count of the times that word v
occurs in document d
, which is poisson distributed. On the right-hand side, we have Beta
which is a V by D matrix of word (V) parameters for each document (D), and Theta
which is a D by D parameter matrix which aims to capture the amount that each document “influences” each other document.
Currently, each row in the influence matrix is drawn from a dirichlet distribution, (Theta[d,] ~ dirichlet(delta)
), where delta
is a hyper-parameter vector of length D. But this set-up implies that documents that occur later can influence those that occur earlier. I would like to be able to constrain some elements of Theta
to be 0
such that I have something like the following (for D = 3
):
Document_1 Document_2 Document_3
Document_1 1.000 0.000 0.000
Document_2 0.367 0.633 0.000
Document_3 0.215 0.196 0.589
That is, for document 1, I would like to be able to draw from a dirichlet with length 1, for document 2 a draw of length 2, etc.
I have tried using sub_row
and block
and segment
to draw only from subsets of my delta
dirichlet prior and assign to subsets of Theta
but to no avail. I’ve also tried zeroing out the cells of Theta
that are not necessary in the transformed parameters block, but then of course the dirichlet distributions will not add up to 1.
Any pointers of a way forward with this would be great. I’ve included the full model code below. Apologies if there is not enough detail here.
Thanks,
Jack
data {
int<lower=2> V; // num words
int<lower=1> D; // num docs
int<lower=0> W[V,D]; // document-feature matrix as integer array
}
parameters {
real phi[D]; // document length parameter
matrix[V,D] Beta; // standardized word parameter for document d
simplex[D] Theta[D];
vector<lower=0>[D] delta;
}
transformed parameters {
matrix[V,D] lp;
matrix<lower=0>[V,D] mu;
for (d in 1:D) {
for (v in 1:V) {
// Linear predictor
lp[v,d] = phi[d] + dot_product(Beta[v],Theta[d,]);
// Mean
mu[v,d] = exp(lp[v,d]);
}
}
}
model {
// priors
for(d in 1:D) {
Beta[,d] ~ normal(0, 2);
phi[d] ~ normal(0,2);
Theta[d,] ~ dirichlet(delta);
}
// likelihood
for (d in 1:D) {
for (v in 1:V) {
W[v,d] ~ poisson(exp(mu[v,d]));
}
}
}