Declaring constrained parameters

stanc

#1

What is the proper way to declare limits and its mean and variances. If I declare these constraints

real<lower=0,upper=1> x;

the sample from a gaussian
x ~ normal(0, 1); <- should this be centered around 0.5 instead like x ~ normal(0.5, 0.5)


#2

To be at least consistent, if you’ve specified parameter x as

real<lower=0,upper=1> x

then if you insist on using a normal distribution as a prior, then it should be a truncated normal like this, following page 82 of the Stan language manual:

x ~ normal(mu, sigma)T[0,1]

where mu and sigma are whatever suits the problem that you are trying to solve. Now you could set mu to zero and sigma to 1, but I doubt that’s a good idea, since such a truncated distribution would have far different behavior from an untruncated distribution with the same values of mu and sigma. Indeed, it’s not clear to me what you are trying to accomplish here.


#3

When the prior has constants for parameters as in normal(0,1) or whatever the truncated distribution and the un-truncated distribution are equivalent logically, because whatever the renormalization factor is, it is also constant.

but if you have parameters mu,sigma and you use normal(mu,sigma), then you need the truncation syntax or some other correction, because as mu,sigma change the renormalization factor changes as well.


#4

Thanks for the explanations. Here is what I am doing. I generated some simulated data for which I know the parameters exactly. I am creating a model to recover these parameters from the data so I know the model is correct.

One of the parameters has mean of 0.6 with variance of 0.09. If I plugin p ~normal(0.6, 0.09), I am pretty sure I would get back a good estimate of the parameter value. However, when applying the model to experimental data, I would not know where to center the normal or know the variance and truncation limits. Setting wide limits seems to have devastating effects on parameter estimation. It is no where close to the input setting even after 2000 iterations. What would an experienced person do. Does it come down to best guess still and run very long iterations?


#5

First off, as a general rule, don’t use hard limits unless you know for sure that your parameter won’t exceed them; e.g. only use a lower bound of zero if you know your prior cannot be negative.

Second, I suspect that setting wide limits isn’t so much the cause of your problems with parameter estimation as it is smoking out other problems in your model and possibly your handling of data.

It would help if you could post your full Stan file, so we could have a better sense of what is wrong.


#6

here is a small section of my whole model, so its easy to explain what I want to do.

functions {
    real logitM(row_vector z, real alpha_tilda, vector gamma_array) {
		real expTerm;
		real m;
    	expTerm = exp(alpha_tilda + (z * gamma_array)); 
    	m = expTerm / (1 + expTerm);
    	return m;
  	}

 	/*
 	**	single time period, single user
 	*/    
	real phi(int C, real alpha_tilda, vector gamma_array, row_vector z) {
		int numberStates = 1;
    	real phiReal = 0;		
    	
  		real m = logitM(z, alpha_tilda, gamma_array);
  		phiReal = ( (C == 1)? m : (1 - m)) ;
      		
		return phiReal;
	}

	real iLogLikelihood_lpmf(int[] C_array, int T, real alpha_tilda, vector gamma_array, row_vector[] z_array){	
		real L_i;
		
		real middleReal; 		
		middleReal = phi(C_array[1], alpha_tilda, gamma_array, z_array[1]) ; //* q_array(0)s

		for(t in 2: T) {
			middleReal = phi(C_array[t], alpha_tilda, gamma_array, z_array[t]) * middleReal;
			if (C_array[t] == 1) break;		// converted. no need to go on
		}
		L_i = log(middleReal);
		return L_i;
	}
}
data {
	int N; // number of users
	int T; // number of days
	int K; // number of predictors. 
	
	int y[N, T]; // outcome
	row_vector[K] z[N, T]; // predictors 

	real alpha_tilda;
}
parameters {
	vector<lower=0,upper=1>[K] gamma;
}

model {
	for(n in 1:N) {
		y[n] ~ iLogLikelihood(T, alpha_tilda, gamma, z[n]);
	}
}

basically, the logit consists the coeffients gamma applied to input events z and added to alpha_tilda to. It is used derive probability of conversion event y.

This model takes 2 hours to run 6000 samples, which by itself is ok, but after I fill in remaining parts, it takes days to run. I would like to get advice whether this can be made to run more efficiently. Once I figure out how to speed this part up, I may be able to speed up the entire model.


#7

A follow up question. This model works very well for higher parameter values where the conversion prob is relatively high. But for low conversion prob, the resolved parameters hover around 0.n, when the true values are around 0.00n. Is there any parameterization tricks to get accurate results in that case, besides feeding a huge size sample set which is not possible to collect in real life.


#8

let me narrow down my question a bit more. Basically I am deriving the beta parameters in a logistic regression.
Exp(alpha + gamma * z) / 1+ exp(alpha + gamma * z)

Gamma is a vector w a small mean and variance.

My question is how best to declare the gamma variable. Ideally I would use a Gaussian prior, but I would not know what to put down as it’s mean. So instead I am using the default uniform prior. Would a Gaussian prior have converged faster? If so how would I declare a Gaussian prior wo knowing the mean?

The posted code has a time series added to the logistic regression, but the basic question is still on gamma parameters themselves.


#9

This a good point to dive into a deeper discussion. I could give up on the constraints, and relie on a small sigma. However what would be a mu that is “appropriet for my problem” when the parameters are coefficients in a logistic regression? Mu is the very thing I am solving for.


#10

There’s an entire chapter of the manual devoted to regression that has examples. The usual assumption is zero centered.


#11

Thanks for pointing out that particular chapter. I read it and it makes sense. In the linear regression section, it discussed the use of QR reparameterization. It seems to me, I could apply that to my logistic regression model as well. Am I right on that?


#12

Yes.


#13

ok I have a question following the line of thought from QR reparametrization and vectorization. In order to implement QR, I am converting the variables from

row_vector[K] x[N, T] ;
int y[N, 2, T];   to

matrix[N, K] x[T];     // there is a T dimenion, because I have a time series problem
vector[N] y[T];

The original loop
for(n in 1:N) {
y[n] ~ iLogLikelihood(alpha_tilda, z, gamma, N, T);
}
to
y ~ iLogLikelihoodf(alpha_tilda, z, gamma, N, T);

However, now I don’t know how to write out the inside of iLogLikelihood_lpdf. I tried write a lpdf function that returns a vector[N], but is rejected. But I don’t know how to convert the vector output that I get into a scalar before returning. Do I arbitrarily insert a sigma and force a normal distribution on y like the std_normal_lpdf example in the manual?


#14

I haven’t been following the whole thread, but I think this is probably because you need to return the sum of the elements in your vector, rather than the vector itself.


#15

I can see the logic of using the product of the elements, but certainly don’t understand about using the sum.


#16

the total log likelihood is the sum of the individual log likelihoods. If
we weren’t working on the log scale it would be a product.


#17

ah yes, my mistake. I am thinking product of likelihood, or sum of log likelihood.