Point estimates

jzh · September 26, 2017, 9:32am

Hi,
I am try to do a multivariate Bayesian regression as follows
r = S*w + e
where r is a 9x1 vector, S is a 9x13 matrix, w is a 13x1 vector and e ~ t(v, 0, Sigma), i.e. an multivariate t distribution. I am using Rstan. The data samples for the simulation ranges from 1000-5000. i.e. an expanding window approach where the likelihood will include 1000 to 5000 samples. However, I can only get results for the first 10 runs or so(with samples 1000-1010). Then I cannot get any result back. The console tracing of the simulation shows no error message. From my previous experience, it is a typical sign of running out of memory. The stan code I used is the following :

		data {
			int<lower=1> K;  
			int<lower=1> M;   
			int<lower=1> N;   // # of samples points, usually 5000 
			vector[K] r[N];   
			matrix[K,M] S[N]; 
			int<lower=1> s_i; // usually 0
			int<lower=s_i+1> e_i; // start from 1000 then increase w each run
		}

		parameters {
			vector[M] w;
			real<lower=2, upper=20> nu;
			cholesky_factor_corr[K] L_Omega;
			vector<lower=0, upper=pi()/2>[K] sigma_unif;
		}

		transformed parameters {
			vector<lower=0>[K] sigma;
			for (k in 1:K) sigma[k] = 2.5*tan(sigma_unif[k]);
		}

		model {
			matrix[K, K] Sigma;
			w ~ normal(0, 1);
			L_Omega ~ lkj_corr_cholesky(2);
			Sigma = tcrossprod(diag_pre_multiply(sigma, L_Omega));
			for (i in s_i:e_i) {
				r[i] ~ multi_student_t(nu, S[i]*w, Sigma);
			}
		} 

		generated quantities {
			real total_err;
			total_err = sqrt(quad_form(tcrossprod(L_Omega), sigma)*nu/(nu-2));
		}

in order to get some simple result back I will try optimizing(…) to get an MAP estimation. But I previous experience with optimizing is the result often contain jumps which is not surprising considering it is an estimate of local minimal. Then according to the stan documentation. I can also get other point estimates like posterior mean and median. The document refers to each interface docs for details. However, I can not find anything on rstan document indicating how can I get these posterior mean and median. Can someone shed some light here. Also, about the memory consumption. A problem like I have what size of RAM is needed on a Linux machine?

bgoodri · September 26, 2017, 12:07pm

You should not run out of memory at 1010 observations for this model. What is the console output?

jzh · September 26, 2017, 2:24pm

Hi Ben,
The console does not show any error message, it actually showed the simulation had finished. But when I try to the get the result back via mean(extract(fit,‘w[1]’)$‘w[1]’) for example, I get nothing.

Jicun

bgoodri · September 26, 2017, 2:45pm

Try it with just one chain, but it seems odd that it would have enough memory to run and then exhaust its memory trying to return the output.

ahartikainen · September 26, 2017, 2:53pm

How do you call sampling to get a fit? How long does the sampling take?

jzh · September 26, 2017, 2:54pm

One thing I need to mention is I am running rstan via Rserve, which might have some extra cost. I will try to set chains =1. Currently I am using optimizing to get the MAP estimate. Do you have any idea how to get posterior mean/median as indicated in stan doc 31.2 and 31.3 ?

Jicun

jzh · September 26, 2017, 2:56pm

fit<-stan(model_code=stan_code, data=stan_data, iter=3000)
It takes at least 30min.

jicun

ahartikainen · September 26, 2017, 3:01pm

You get summary statistic table by

print(fit)

Posterior mean / median are just mean / median of the mcmc sample.

bgoodri · September 26, 2017, 3:49pm

There is no way to get the posterior mean or median from the posterior mode unless you just assume all three are equal, which is rarely the case in non-trivial models. We need to get the Rserve thing figured out so that you can use a genuine posterior distribution.

jzh · September 26, 2017, 8:35pm

OK, that is perfectly reasonable, no magic. Thanks for clarification!

jicun

jzh · September 26, 2017, 8:40pm

Yes samples from the posterior solve everything. I have encountered similar problems before. After I increase the memory for the machine(virtual), the problem is gone. At that time, I had multiple Rserve sessions which correspond to multiple processes in the Linux machine. Now I have cut down to 1 session only, the next step is to cut down chains.

jicun

Topic		Replies	Views
Multivariate analysis Modeling rstan , fitting-issues , specification	4	532	April 22, 2021
Unexpected error when running Rstan in Windows computer Modeling	5	458	April 20, 2022
Rstan results from MLE (optimizing) and MCMC (stan) do not match General rstan , techniques , fitting-issues	7	505	October 18, 2020
Launch_shinystan rstanarm shinystan	4	938	December 19, 2018
Variational Bayes - rstan - Error: cannot allocate vector of size 250.0 Gb Algorithms	3	582	September 13, 2019

Point estimates

Related topics