Identifiability in structural reliability model

Hi all,

I am currently trying to implement a colleague’s structural reliability code in Stan. The gist of it is as follows: we have a database of successes and failures y and their associated measured predictors x. Reliability theory assumes that these outcomes can be explained by a limit state function g(x, \theta), with \theta being vector of model parameters. The limit state function is negative for failures and positive for successes. The idea was to use Bayesian updating to determine the posterior distributions of model parameters.

The original derivation of the likelihood looks, to me, almost identical to a latent variable probit model:

First, they added a model error term to account for an imperfect specification of the limit state function.

g = \hat{g}(x_i, \theta) + \epsilon_i, \epsilon \sim N(0,\sigma_e)

The original data had six predictor variables and used the following form of the limit state:

\hat{g}(x, \theta) = x_1*(1+\theta_1*x_2) +(\theta_2*x_2) + x_3 *(1+\theta_3*x_2) - \theta_4*ln(x_4) - \theta_5*ln(x_5) - \theta_6*ln(x_6) - \theta_7
Pr(y = 0 | x) = Pr (g < 0 | x) = \Phi (-\frac{\hat{g}(x, \theta)}{\sigma_e})
Pr(y = 1 | x) = Pr (g > 0 | x) = \Phi (\frac{\hat{g}(x, \theta)}{\sigma_e})
\ell(\theta, \sigma_e | x) = \prod_{i = 1}^k \Phi (-\frac{\hat{g}(x_i, \theta)}{\sigma_e}) *\prod_{i = k + 1}^n\Phi (\frac{\hat{g}(x_i, \theta)}{\sigma_e}) for k cases of failures and n total cases.

Now for my question. It is my understanding that in the latent variable motivation for probit models the scale parameter is not identifiable because scaling it and the coefficients by the same constant would give the same results. Does this formulation have the same limitation?

The first and the third terms of the definition of \hat{g} is not linear with respect to \theta, but affine.

Hence, for all scaling factor 0<\alpha \in \mathbb{R} , it follows that

\frac{\hat{g}(x| \alpha \theta)}{\alpha \sigma_e} \neq \frac{\hat{g}(x| \theta)}{ \sigma_e} .

Thus I think your problem does not occur.

I am not sure, but if the first and the third terms of the \hat{g} is linear, then as you said, sampling will fail since the label switching issue will occur.

I misunderstand your problem ?

Oh ok, I see that now – that was my question.

Jumping in on this again, here is the stan code I’m using to estimate the model coefficents.

data {
  int <lower = 1> N; //number of data points
  vector[N] x1;
  vector[N] x2;
  vector[N] x3;
  vector[N] x4;
  vector[N] x5;
  vector[N] x6;
  int<lower=0,upper=1> y[N];
    }
parameters {
  vector[7] t;
  real <lower = 0> sig_e;

}
model {
  vector[N] beta;
  for (i in 1:N) {
 beta[i] = (x1[i] * (1 + t[1] * x2[i]) + (t[2] * x2[i])
  + x3[i] * (1 + t[3] * x2[i]) - (t[4] * log(x4[i]))
  - (t[5] * log(x5[i])) - (t[6] * log(x6[i])) - t[7])/sig_e; //limit state formulation
}

  t ~ normal(0,10);
  sig_e ~ normal(0,1);
  for (i in 1:N) {
    if (y[i] == 1) {
      target += log(Phi(beta[i])); //likelihood
    }
    if (y[i] == 0) {
      target += log(Phi(-beta[i]));
    }
}
}

The code runs fine, but I’m not getting anywhere near the coefficents from the earlier work. I suspect an algebraic error or a mis-coding of the likelihood function? Anyone else spot anything?