Hi
I’m new to Stan (and pyStan) and am having trouble with a simple example.
My data is two N(0,1) variables x and y, with correlation rho. For each variable I have N observations. I want to use the following model to infer the correlation. In the model I’ve defined a transformed parameter z, which is N(0,1) and uncorrelated to x.
import numpy as np
import pandas as pd
import pystan as pys
import scipy as sp
model_code = """
data {
int<lower=0> N; // length of data set
vector[N] x; // data, N-length vector
vector[N] y; // data, N-length vector
}
parameters {
// correlation parameter, this is to be inferred
real<lower=-1.0,upper = 1.0> rho;
}
transformed parameters {
// auxiliary variable, should be N(0,1) and uncorrelated with x
vector[N] z;
z = (y - rho * x) / sqrt(1.0 - rho * rho);
}
model {
// prior distribution
rho ~ uniform(-1.0, 1.0);
// likelihood
// Is there an issue with using a transformed parameter in the ikelihoods?
x ~ normal(0.0, 1.0);
z ~ normal(0.0, 1.0);
}
"""
model = pys.StanModel(model_code=model_code)
"""
model = pys.StanModel(model_code=model_code)
I’m using this model as follows, but it is not converging to the correct value for the correlation rho. For a true rho of 0.6, the model is converging to 0.35 +/- 0.2
N = 1000
true_correl = 0.6
true_covar = np.array([[1, true_correl],[true_correl, 1]])
observations = sp.stats.multivariate_normal.rvs(np.array([0, 0]), true_covar, size=N)
data_df = pd.DataFrame(observations, columns=['X_obs', 'Y_obs'])
model_dat = {
'N': N,
'x': data_df['X_obs'].values,
'y': data_df['Y_obs'].values,
}
fit = model.sampling(data=model_dat, iter=5000, warmup=2500, chains=2)
mean_predicted_rho = np.average(fit.extract()['rho']) # gives 0.35 +/- 0.02
print mean_predicted_rho
I know I could to this by using a multi_normal instead, but this is a toy model that I’d like to extend with this structure so I’m keen to get this example working.
I’m not sure where I’ve gone wrong, so any advice would be greatly appreciated.
Thanks