Non-linear least squares fitting Issue

Hi,

I am new to stan and was hoping to use it to fit some mass spec data. I have normalized the data and believe that it should be the addition of two skewed gaussians. I was working from this guide and modified it to be the sum of two skewed gaussians. I have tried the following code but seem to be misunderstanding how to correctly define my model. Any help would be appreciated!

Best,
Matt

stan_code="""
data {
 int<lower = 0> N;
 vector[N] y;
}

parameters {
  ordered[2] xi;
  real<lower=0> alpha[2];
  real<lower=0> omega[2];

}

model {
 xi ~ normal(0,1);
 alpha ~ normal(0, 2);
 omega ~ normal(0,2);
 for (n in 1:N)

  target += log_sum_exp(skew_normal_lpdf(y[n] | xi[1], alpha[1],omega[1]),
                        skew_normal_lpdf(y[n] | xi[2], alpha[2],omega[2]));

}
"""

Attached is the data
data1.csv (9.2 KB)


I have tried to simply my model to fit a single gaussian and basing it off of this post. I am still unable to fit the model unfortunately

"""

functions {
    real phi(real x, real mu, real sigma){
        return 1 / (sigma * sqrt(2 * pi())) * exp(-(x - mu)^2/(2 * sigma^2));
    }
}

data {
 int<lower = 0> N;
 vector[N] y;
 vector[N] x;
}


parameters {
  real m;
  real<lower=0> sigma;
  real<lower=0> s;
}



model {
    //priors
    m ~ normal(0, 1);
    s ~ normal(0,1);
    sigma ~ normal(0,1);

    for(i in 1:N){
        y[i] ~ normal(phi(x[N], m, s),  sigma);
}
}
"""

Hey Matt,

I haven’t tried fitting your model myself, but what happens if you do?

The model takes long to fit? What does eg CmdStan’s diagnose method say?

Thanks for the reply Funko! The model is able to be built but I receive the following error message when I run it using pystan

  Adjust your expectations accordingly!
  Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
  Exception: normal_lpdf: Location parameter is nan, but must be finite! (in '/var/folders/6h/9f7j8fsj22v6ddn590l30hrr0000gn/T/httpstan_4mw73x6z/model_dyuyrw4x.stan', line 5, column 8 to column 77)
  If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
  but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
  Gradient evaluation took 0.000369 seconds
  1000 transitions using 10 leapfrog steps per transition would take 3.69 seconds.
  Adjust your expectations accordingly!
  Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
  Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in '/var/folders/6h/9f7j8fsj22v6ddn590l30hrr0000gn/T/httpstan_4mw73x6z/model_dyuyrw4x.stan', line 5, column 8 to column 77)
  If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
  but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.
<stan.Fit>
Parameters:
    m: ()
    sigma: ()
    s: ()
Draws: 2000

Here is the code I am running in PyStan

df = pd.read_csv('data.csv',header=0,index_col=0)

yCol = np.array(df['Y']).reshape(-1, 1)

from sklearn.preprocessing import MinMaxScaler
minmax_scale = MinMaxScaler().fit(yCol)
df_minmax = minmax_scale.transform(yCol)




xDist=np.arange(0,len(df_minmax[:,0]))

curve_data={'N':yCol.shape[0],'y':df_minmax[:,0],'x':xDist}

posterior = stan.build(epsilon_code,data=curve_data,random_seed=1)
fit = posterior.sample(num_chains=2, num_samples=1000)
print(fit)
1 Like

There’s a bit of a conceptual gap here. You have data whereby you have a measured absorption (or maybe the reverse?) at each of many wavelengths and you are trying to do inference on the underlying functional form of the data. Visually, the data look like they trace out the curve reflecting the probability density function of a mixture of two skew-normals, but that is different from modelling the data as a skew-normal-mixture. Think of the skew-normal (or a mixture of multiple skew-normals) as a convienient parameterized shape that expresses your expectation (f(x)) for what you will observe at each wavelength under conditions of zero measurement noise. What that framing highlights then is that you still need to add several quantities before you are actually have a proper generative model of your data.

Foremost is that you need a structure to convey that your real data indeed have measurement noise. You’ll have to use your domain expertise to establish what that structure looks like; normal probably doesn’t make sense given that zero is presumably a hard lower bound for measurement. Maybe some form of beta distribution with the shape influenced by f(x) so that for f(x)=0 the noise has a positive skew and for f(x)=1 the noise has a negative skew. But regardless of the noise model, your code should end up looking like:

data {
 int<lower = 0> N;
 vector[N] y;
 vector[N] x; // you were missing this!
}
parameters {
 ... // some parameters associated with the latent function
 ... // one or more parameters associated with the noise model
}
model {
  ... // priors
  vector[N] fx = f(x,f_parameters) ; // shorthand for a set of operations relating x to the latent function
  target+= noise_model_lpdf(y | fx, noise_parameters) ; // shorthand for the noise model 
}

Next you also need either a mixture proportion to capture the fact that clearly the two peaks are at different heights. I’ve never used a mixture-PDF as the form of a latent function, but my intuition is that you can use log-mix but you will want to do exp(log_mix(...)) so that you get the un-logged PDF curve values; the log-scale shape of the PDF will be very different and not what you’re looking for.

4 Likes

Hm, I don’t actually now pystan that well. The output does not actually look too bad, or does it? What does arviz.summary return?

I tried running your code, but the data1.csv does not fit to the script.

Regardless, do look into what @mike-lawrence said, I did not think that hard about this yet but I think he’s right.

There doesn’t seem to be any noise in your simulated data, which makes this an exercise in curve fitting rather than probabilistic modeling. Is measurement error in your real data really negligible? If it is you don’t need Stan, although its optimization functionality might be usable. If it’s not you should add some to your fake data.

In the code in your second post the subscript on x should be i, not N.

Why do you believe that low-noise data changes the need for proper propagation of uncertainty?

Because without some random component there’s no uncertainty to propagate, just unknowns to solve for. The original hypothesis was the observed curve could be fitted with the weighted sum of two curves that look like skew normal densities. It might be fine to use least squares as an optimization criterion but there would be no probabilistic interpretation of the solution.

I’m guessing there’s measurement error in the real data and a properly formulated Stan model would be an appropriate tool. It’s probably not going to work well with simulated perfect data though.

Ok, so long as you were referring to simulated-data scenario only (and as you say, simulated data where no measurement noise structure has been incorporated to the simulation). Since the real world always has noise it is always best to propagate the uncertainty that noise induces formally