Need help in using Matrix Completion Method code

Joshua_Jr_Santiago · December 8, 2020, 7:15am

Hi everyone!

I hope you don’t mind a noob question, I’ve recently started exploring Stan. I’m trying to replicate a matrix completion method detailed in this paper to be used in the prediction of other thermodynamic properties. https://pubs.acs.org/doi/suppl/10.1021/acs.jpclett.9b03657/suppl_file/jz9b03657_si_001.pdf
In any case I was able to run the following code in CmdStan. using variational inference.

data {
	int<lower=0> I;//solute
	int<lower=0> J;// solvent
	int<lower=0> K; // latent dimension
	real ln_gamma [I,J];// matrix with missing data =-99.0
	real<lower=0> sigma_0; //prior std dev
	real<lower=0> lambda;// likelihood scale
}

parameters {
vector[K] u[I]; //solute feature vectors
vector[K] v[J]; // solvent feature vectors
}

model {
//prior draw feature vectors for all solutes and solvents
for (i in 1:I)
	u[i] ~ normal(0,sigma_0);
for (j in 1:J)
	v[j] ~ normal(0,sigma_0);
//likelihood: model the probability of ln_gamma as a Cauchy distribution
//around the dot product of the feature vectors
for (i in 1:I) { 
	for (j in 1:J) {
		if (ln_gamma[i,j]!= -99.0) { //train to available data only
	ln_gamma[i,j] ~ cauchy(u[i]' * v[j], lambda);
			}
		}
	}
}

How do I move forward with this using cmdstan to generate the predictions from the feature vectors?

Thank you!

andrjohns · December 8, 2020, 8:11am

Hi Joshua, welcome to the forums!

If you want to generate the predicted value for each combination of solute and solvent feature vectors, you just need to a generated quantities block:

generated_quantities {
  real ln_gamma_pred[I,J];
  for (i in 1:I) { 
	for (j in 1:J) {
	  ln_gamma_pred[i,j] = cauchy_rng(u[i]' * v[j], lambda);
    }
  }
}

I’ve used real[I,J] above for consistency with the rest of your code, but you should really use the matrix[I,J] type as indexing is faster.

Additionally, it will be more efficient to use dot_product(u[i], v[j]) rather than the transposition and multiplication

Joshua_Jr_Santiago · December 8, 2020, 8:34am

Thank you for your prompt reply andrjohns! Just to confirm this means that I need to compilte a separate executable to generate the quantities?

andrjohns · December 8, 2020, 8:37am

You add the generated quantities block to your model, so your model would then look like:

data {
	int<lower=0> I;//solute
	int<lower=0> J;// solvent
	int<lower=0> K; // latent dimension
	real ln_gamma [I,J];// matrix with missing data =-99.0
	real<lower=0> sigma_0; //prior std dev
	real<lower=0> lambda;// likelihood scale
}

parameters {
vector[K] u[I]; //solute feature vectors
vector[K] v[J]; // solvent feature vectors
}

model {
//prior draw feature vectors for all solutes and solvents
for (i in 1:I)
	u[i] ~ normal(0,sigma_0);
for (j in 1:J)
	v[j] ~ normal(0,sigma_0);
//likelihood: model the probability of ln_gamma as a Cauchy distribution
//around the dot product of the feature vectors
for (i in 1:I) { 
	for (j in 1:J) {
		if (ln_gamma[i,j]!= -99.0) { //train to available data only
	ln_gamma[i,j] ~ cauchy(u[i]' * v[j], lambda);
			}
		}
	}
}
generated_quantities {
  real ln_gamma_pred[I,J];
  for (i in 1:I) { 
	for (j in 1:J) {
	  ln_gamma_pred[i,j] = cauchy_rng(u[i]' * v[j], lambda);
    }
  }
}

Then your output will contain the predicted values

Joshua_Jr_Santiago · December 24, 2020, 7:28am

Thank you! Been experimenting with these. It seems that I get 1000 sample size of the posterior. If suppose, I want to get the posterior means, this should be the stansummary in cmdstan right?

Topic		Replies	Views
CmdStan 2.26.0 release candidate Announcements	5	1600	January 20, 2021
Optimizing stan code runtime for matrix factorization Modeling	6	291	May 18, 2024
CmdStan & Stan 2.36 release candidate General	2	412	December 3, 2024
CmdStan & Stan 2.30 release candidate Developers	16	1381	August 5, 2022
Memory retention case study: behavior of generated quantities block and extract function RStan cognitive-science	2	489	January 12, 2020

Need help in using Matrix Completion Method code

Related topics