Extracting correlation from bivariate normal distribution

Trajan · May 4, 2018, 7:34pm

Hi

I’m new to Stan (and pyStan) and am having trouble with a simple example.

My data is two N(0,1) variables x and y, with correlation rho. For each variable I have N observations. I want to use the following model to infer the correlation. In the model I’ve defined a transformed parameter z, which is N(0,1) and uncorrelated to x.

import numpy as np
import pandas as pd
import pystan as pys
import scipy as sp

model_code = """
data {
    int<lower=0> N; // length of data set
    vector[N] x;    // data, N-length vector
    vector[N] y;    // data, N-length vector
}
parameters {
    // correlation parameter, this is to be inferred
    real<lower=-1.0,upper = 1.0> rho;
}
transformed parameters {
    // auxiliary variable, should be N(0,1) and uncorrelated with x 
    vector[N] z;
    z = (y - rho * x) / sqrt(1.0 - rho * rho);
}
model {
// prior distribution
    rho ~ uniform(-1.0, 1.0); 
// likelihood
// Is there an issue with using a transformed parameter in the ikelihoods?
    x ~ normal(0.0, 1.0);
    z ~ normal(0.0, 1.0);
}
"""

model = pys.StanModel(model_code=model_code)
"""
model = pys.StanModel(model_code=model_code)

I’m using this model as follows, but it is not converging to the correct value for the correlation rho. For a true rho of 0.6, the model is converging to 0.35 +/- 0.2

N = 1000

true_correl = 0.6
true_covar = np.array([[1, true_correl],[true_correl, 1]])
observations = sp.stats.multivariate_normal.rvs(np.array([0, 0]), true_covar, size=N)

data_df = pd.DataFrame(observations, columns=['X_obs', 'Y_obs'])
model_dat = {
    'N': N, 
    'x': data_df['X_obs'].values,
    'y': data_df['Y_obs'].values,
}
fit = model.sampling(data=model_dat, iter=5000, warmup=2500, chains=2)

mean_predicted_rho = np.average(fit.extract()['rho']) # gives 0.35 +/- 0.02
print mean_predicted_rho

I know I could to this by using a multi_normal instead, but this is a toy model that I’d like to extend with this structure so I’m keen to get this example working.

I’m not sure where I’ve gone wrong, so any advice would be greatly appreciated.

Thanks

ermeel · May 4, 2018, 7:56pm

The problem I see here is that z depends also additionally on rho, so the Jacobian is not constant. Setting a prior in the model block for a (linearly) transformed parameter would only be ok when the transformation is constant or depends on data only. See also The QR Decomposition For Regression Models.

bgoodri · May 4, 2018, 9:47pm

Yeah, what @ermeel said about the change-of-variables is correct. But you should be able to deal with a bivariate normal distribution by composing a marginal normal distribution for x with a conditional normal distribution for y given x. See

Topic		Replies	Views
Specifying Correlation in Stan Modeling	10	1019	October 9, 2021
Tip for Fitting Multivariate Normal Modeling	0	1602	August 9, 2019
Initial values for optimisation in pystan PyStan	0	1139	February 26, 2019
Stan overestimating covariance in multivariate normal mixture model Modeling	3	932	June 29, 2020
Inference for multinormal on cholesky-transformed data fails to recover sensible estimates Modeling fitting-issues	3	50	January 17, 2025

Extracting correlation from bivariate normal distribution

Related topics