@luu

on probability likelihoods

kind of a weak one here, but just a quick insight regarding how some likelihoods dissolve into quadratics. i plan on writing more about AI & ML research topics & reviewing papers soon aswell, so let this be the first of many. (targeting linear regression in this one lol sorry)

                                                 .

i'll assume a regression problem with the likelihood function;

p(yx)=𝒩(yf(x),σ2)

where xD are inputs and y ∈ R are just noisy function values (targets)

                                       .

you get an easier analytical solution when u assume a Gaussian normal distribution:

𝒩(yf(x),σ2)

and i'll explain why;

this is because the likelihoods dissolve into quadratics, which enables closed form solutions; aswell as derivatives if u opt for a gradient descent algorithm. check it out;

                                        .

we'll start with the normal pdf for one observation explicitly;

p(yx)=12πσ2exp((yf(x))22σ2)
                                        .

and then take the log-likelihood;(usually done to simplify products into sums btw)

logp(yx)=12log(2πσ2)(yf(x))22σ2

notice that the only term that depends on your parameters is the squared error

                                        .

now view the sum over all data points;

log(θ)=i=1Nlogp(yixi)=N2log(2πσ2)12σ2i=1N(yif(xi))2

so you clearly see, maxing the log likelihood is no different than minimizing the sum of squared errors, a quadratic.

                                        .

i will shortly update this post with the multivariate/linear algebra form to better show how likelihoods dissolve.

@luuchrist