on probability likelihoods

19 Sep, 2025

kind of a weak one here, but just a quick insight regarding how some likelihoods dissolve into quadratics. i plan on writing more about AI & ML research topics & reviewing papers soon aswell, so let this be the first of many. (targeting linear regression in this one lol sorry)

i'll assume a regression problem with the likelihood function;

p (y ∣ x) = 𝒩 (y ∣ f (x), σ^{2})

where $x \in ℝ^{D}$ are inputs and y ∈ R are just noisy function values (targets)

you get an easier analytical solution when u assume a Gaussian normal distribution:

𝒩 (y ∣ f (x), σ^{2})

and i'll explain why;

this is because the likelihoods dissolve into quadratics, which enables closed form solutions; aswell as derivatives if u opt for a gradient descent algorithm. check it out;

we'll start with the normal pdf for one observation explicitly;

p (y ∣ x) = \frac{1}{\sqrt{2 π σ^{2}}} \exp (- \frac{(y - f (x))^{2}}{2 σ^{2}})

and then take the log-likelihood;(usually done to simplify products into sums btw)

\log p (y ∣ x) = - \frac{1}{2} \log (2 π σ^{2}) - \frac{(y - f (x))^{2}}{2 σ^{2}}

notice that the only term that depends on your parameters is the squared error

now view the sum over all data points;

\log ℒ (θ) = \sum_{i = 1}^{N} \log p (y_{i} ∣ x_{i}) = - \frac{N}{2} \log (2 π σ^{2}) - \frac{1}{2 σ^{2}} \sum_{i = 1}^{N} (y_{i} - f (x_{i}))^{2}

so you clearly see, maxing the log likelihood is no different than minimizing the sum of squared errors, a quadratic.

i will shortly update this post with the multivariate/linear algebra form to better show how likelihoods dissolve.

@luuchrist