on probability likelihoods
kind of a weak one here, but just a quick insight regarding how some likelihoods dissolve into quadratics. i plan on writing more about AI & ML research topics & reviewing papers soon aswell, so let this be the first of many. (targeting linear regression in this one lol sorry)
.
i'll assume a regression problem with the likelihood function;
where are inputs and y ∈ R are just noisy function values (targets)
.
you get an easier analytical solution when u assume a Gaussian normal distribution:
and i'll explain why;
this is because the likelihoods dissolve into quadratics, which enables closed form solutions; aswell as derivatives if u opt for a gradient descent algorithm. check it out;
.
we'll start with the normal pdf for one observation explicitly;
.
and then take the log-likelihood;(usually done to simplify products into sums btw)
notice that the only term that depends on your parameters is the squared error
.
now view the sum over all data points;
so you clearly see, maxing the log likelihood is no different than minimizing the sum of squared errors, a quadratic.
.
i will shortly update this post with the multivariate/linear algebra form to better show how likelihoods dissolve.
@luuchrist