# Sum of sinusiods modelling, label switching

Hello everyone I’m quite new to the stan programing and I’m at the moment attempting to fit time series data to the sum of two sinusoids (possible more in the future). My main problem is label switching as running one chain of the model works fine, but as soon as 4 chains are running the model breaks down.

My model has the following structure:
y(t)=\sum_{j=1}^{k}(A_ j\cdot cos(2\pi f_jt+2\pi\phi_j )\cdot \exp(-a_jt) ).

I assume the following priors for each parameter (a,A,\phi,f) :
a\sim N(0,2.2)
A\sim N(0,\Sigma)
\phi\sim U(0,1)
f\sim U(0,15)

My likelihood I presume to be a normal distribution with a single variance parameter and a sum of two mean values:
y|(a,A,\phi,f)=N(\mu_1+\mu_2,\sigma^2)

where \mu_1=A_ 1\cdot cos(2\pi f_1t+2\pi\phi_1 )\cdot \exp(-a_1t) and \mu_2=A_ 2\cdot cos(2\pi f_2t+2\pi\phi_2 )\cdot \exp(-a_2t)
and finally \sigma^2\sim InvGamma(2,10)

I simulate my data as the following: (utilizing rstan)

# two sinusoid models #

# data and model params
N=100 # points
dt=0.1 # dwell time
t_max=0.99

# time points
j_idx=seq(0,t_max,length.out = N)
time=j_idx*1

iu=as.complex(1i) # complex imaguinary unit

# component 1 parms
A_1=1
phi_1=0.2
f_1=10
alpha_1=3

# component 2 parms
A_2=5
phi_2=0.5
f_2=8
alpha_2=1

# component 1 and 2
y1=cos(2*pi*time*f_1+2*pi*phi_1)*exp(-alpha_1*time)*A_1
y2=cos(2*pi*time*f_2+2*pi*phi_2)*exp(-alpha_2*time)*A_2

# full data
y=y1+y2

y_noise=y+rnorm(length(y),0,0.1) # data with noise

library(loo)
library(rstan)
library(RcppEigen)
library(bayesplot)
library(gridExtra)
library(rstanarm)

# data set up for rstan
dat=list(N = length(y_noise),
time=c(t(time)),
k=2,
y = c(t(Re(y_noise))))

# fitting with rstan - note set your own working directory or paste in the stan model
fit1 <- stan(file = 'testing stan_2_15_5M.stan',cores=1,chains = 1, data = dat,iter=2000,control = list(adapt_delta=0.8,max_treedepth=10)) # using 1 chain fit is fine

fit2<- stan(file = 'testing stan_2_15_5M.stan',cores=4,chains = 4, data = dat,iter=2000,control = list(adapt_delta=0.8,max_treedepth=10)) # using 4 chains label switching



The code of the stan script is as of the following

data{
int k; // no of signals
int N;// total number of points
vector[N] time; // observed time points
vector[N] y; // observed signal values
}

parameters{
vector <lower=0> [k] A; // signal amplitude
vector <lower=0> [k] a; // signal decay constant
vector <lower=0> [k] phi; // phase of signals
vector <lower=0> [k] f; // frequency of signals
real<lower=0> sigma; // signal error
}

model {
vector[N] B1;
vector[N] B2;

for (i in 1:N){
B1[i]=A[1]*cos(2*pi()*time[i]*f[1]+2*pi()*phi[1] )*exp(-a[1]*time[i]);
B2[i]=A[2]*cos(2*pi()*time[i]*f[2]+2*pi()*phi[2] )*exp(-a[2]*time[i]);
}
// priors
a~normal(0,2.2);
A[1]~normal(0,sqrt(dot_product(B1',B1)));
A[2]~normal(0,sqrt(dot_product(B2',B2)));
phi~uniform(0,1);
f~uniform(0,15);

// posterior
y~normal(B1+B2,sigma);
}


The output of the the 1 chain fit is stating close to the correct values of the parameters, whilst the 4 chain is much off. Utilizing some diagnostics from the 4 chain run I found the following:

none of the parameter chains are mixing - I tried to use ordered "insert parameter" [k] and also postive_ordered [k] but that did not work. Does anyone have a suggestion as to what might be done about the problem? :)

Welcome to the Stan forum!

First off, Bayesian estimation of mixture models is hard, and one possible conclusion is that a particular model simply can’t be estimated well.

Having said this, one basic way to reduce the label switching problem is to put ordering constraints on the variables. (probably phi in your model)

Most importantly: @betanalpha’s case study on identifying mixture models describes this and other methods. The case study will also use you with a useful does of scepticism about the ability to estimate mixture models. (With which don’t mean that it can’t be done, but that it is not trivial.)

One can also try to deal with the label switching in the post-processing or the generated quantities block. But I would first try to specify the model in a way that this is not necessary.

2 Likes

I did get this to work though this is unlikely to work if you add another time series. I’m also doubtful on if this works on real data.

The issue with these models is that as you add more parameters there’s an increase in the number of combinations that would fit the data equally well. You must put fairly strong domain knowledge or other constraints onto the parameters to remove the label switching. However, at the point that these models become identifiable you almost already know what the values of the parameters are. So the utility of the model is low and this becomes especially relevant as the order of the parameter space increases.

So what did I do.

The least offensive changes were adding positive ordered constraints to A, a,phi, and f. Even with this the model was not identified due to f. You can see in the pairs plot that f could reasonably be [8.9, 9.0] or [8.0, 10.0],

where the second vector is the true parameter values. I added a repulsive prior (see Mixture Models) so that components of f near each other are “repulsed”. This is unlikely to work well if the dimension of f were > 2.

The output running on 4 chains is

> fit1
Inference for Stan model: sine_mix_forum.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
A[1]      0.52    0.00 0.03   0.45   0.50   0.52   0.54   0.59  1866    1
A[2]      5.23    0.00 0.06   5.12   5.19   5.23   5.27   5.35  2145    1
a[1]      1.05    0.00 0.04   0.95   1.03   1.05   1.08   1.12  1422    1
a[2]      1.08    0.00 0.03   1.03   1.07   1.08   1.10   1.13  2033    1
phi_raw   0.17    0.00 0.02   0.14   0.16   0.17   0.19   0.21  1479    1
dphi      0.32    0.00 0.02   0.29   0.31   0.32   0.33   0.35  1483    1
f_raw     8.01    0.00 0.00   8.00   8.00   8.01   8.01   8.01  5698    1
df        2.12    0.00 0.05   2.01   2.09   2.12   2.16   2.22  1631    1
sigma     0.15    0.00 0.01   0.12   0.14   0.14   0.15   0.17  2466    1
phi[1]    0.17    0.00 0.02   0.14   0.16   0.17   0.19   0.21  1479    1
phi[2]    0.49    0.00 0.00   0.49   0.49   0.49   0.50   0.50  4498    1
f[1]      8.01    0.00 0.00   8.00   8.00   8.01   8.01   8.01  5698    1
f[2]     10.13    0.00 0.05  10.02  10.09  10.13  10.16  10.23  1645    1
lp__    135.03    0.06 2.31 129.69 133.65 135.42 136.73 138.44  1355    1


with a pairs plot of f now

Stan code

functions {
real repulsive_lpdf(vector mu, vector rho) {
int K = num_elements(mu);
matrix[K, K] S = diag_matrix(rep_vector(1, K));
matrix[K, K] L;
int c;

for (k1 in 1:(K - 1))
for (k2 in (k1 + 1):K){
c = K - (k2 - k1);
S[k1, k2] = exp(- squared_distance(mu[k1], mu[k2]) / (0.5 + rho[c]));
S[k2, k1] = S[k1, k2];
}

L = cholesky_decompose(S);

return 2 * sum(log(diagonal(L)));
}
}
data{
int k; // no of signals
int N;// total number of points
vector[N] time; // observed time points
vector[N] y; // observed signal values
}

parameters{
positive_ordered[k] A; // signal amplitude
positive_ordered[k] a; // signal decay constant
real<lower=0, upper=1> phi_raw;
real<lower=0, upper=1 - phi_raw> dphi;
real<lower=5, upper=15> f_raw;
real<lower=0, upper=15 - f_raw> df;
real<lower=0> sigma; // signal error
}
transformed parameters {
vector[k] phi;
vector[k] f;
// phi is an implied ordered vector with lower bound = 0 and upper bound = 1
phi[1] = phi_raw;
phi[2] = phi_raw + dphi;
// f is an implied ordered vector with lower bound = 5 and upper bound = 15
f[1] = f_raw;
f[2] = f_raw + df;
}
model {
// priors
a ~ normal(0, 2);
A ~ normal(0, 5);
sigma ~ normal(0, 0.2);
[f_raw, f_raw + df]' ~ repulsive([0.5, 0.5]');
// posterior
{
vector[N] B1 = A[1] * cos( 2 * pi() * time * f[2] + 2 * pi() * phi[1] ) .* exp( -a[1] * time );
vector[N] B2 = A[2] * cos( 2 * pi() * time * f[1] + 2 * pi() * phi[2] ) .* exp( -a[2] * time );
y ~ normal(B1 + B2, sigma);
}
}

2 Likes