Convexity of Single level hierarchical model

So we have a single level hierarchical model with gaussian priors, one global and p-local, to be explicit:
p(\theta | x) \propto p(x | \theta) p(\theta)

In this case, where for simplicity we’re assuming variance is known (which is not what we like):

p(\beta_0) \sim normal(0, \sigma^2_0)
p(\beta_j |\beta_0, \sigma_j^2) \sim normal(\beta_0, \sigma_j^2)
p(y | \beta_j, X) \sim normal(X \beta_j, \sigma^2)

p(x|\theta)p(\theta) becomes:

p(y_i | \beta_j, x) p(\beta_j | \beta_0) p(\beta_0)

Take log, drop normalization constant to get our objective function, we get:

\frac{\sum(y - X\beta_j)^2}{2\sigma_y^2} + \frac{(\beta_j - \beta_0)^2}{2\sigma_j^2} + \frac{\beta_0^2}{2\sigma_0^2}

We’d want to maximize the negative log-posterior.

And then taking a quick look at convexity, WRT to each parameter. Here I’m fuzzy. I have two inequalities:

A few questions:

  1. I’ve done Gibbs sampler derivations a few times and in that case it’s easy to see how for the posterior variance can be a weighted average of global and local coefficients. Here, it was appealing to just take log because it looks so much like a standard optimization regularized regression problem. What am I doing wrong?
  2. I’m given this inequality to check if something is convex, which makes sense when we have some arbitrary function and not many parameters but is not so clear as to what to do when I’m looking at a Bayesian model (with lots of parameters): Check to see that: f(\theta x +(1-\theta)y) \leq \theta f(x) + (1-\theta) f(y):

Ok, and, for simplicity ignoring the hierarchical prior in the objective function, the left hand side would turn out to be… ok we need x,y in dom f, so, I’m guessing I hold everything constant and only look at beta… how do I unpack this inequality exaclty? It’s not so clear when the function becomes more complex.

I’ll think a bit more about it.

Any recasting of this problem to make it more clear would be more appreciated. For example, one exercise just substituted an arbirtary line a quadratic equation and it was clear from “generalized” intuition from basic high school algebra/calculus that it was convex (positive leading coefficient and then even term). This was very easy to see with some basic matrix algebra.

What am I missing?

Edit:

first element of summand should be, excluding constant: (y-XB)^T (y - XB)

There are many things I don’t understand in your post. But here’s something you might have overlooked. If you want to know whether the objective function f is convex w.r.t a certain parameter \theta, you could try checking that \frac{\partial^2 f}{\partial\theta^2} is nonnegative on its entire domain. Of course, that requires f to be \mathcal{C}^2. If you need global convexity, then this notion generalises and you need to show that the Hessian is positive semidefinite.

Thanks. It’s convex. Non negative weight sum of convex functions is convex.

These are all quadratic and leading positive coefficient and therefore convex.

Edit:

We can also just use simple conjugate intuition: Guassian prior * gaussian likelihood ~ gaussian posterior, chain in another gaussian and the posterior is still gaussian, which is convex.