Positive log-likelihood values happen
From time to time, we get a question from a user puzzled about getting a positive log likelihood for a certain estimation. We get so used to seeing negative log-likelihood values all the time that we may wonder what caused them to be positive.
First, let me point out that there is nothing wrong with a positive log likelihood.
The likelihood is the product of the density evaluated at the observations. Usually, the density takes values that are smaller than one, so its logarithm will be negative. However, this is not true for every distribution.
For example, let’s think of the density of a normal distribution with a small standard deviation, let’s say 0.1.
. di normalden(0,0,.1) 3.9894228
This density will concentrate a large area around zero, and therefore will take large values around this point. Naturally, the logarithm of this value will be positive.
. di log(3.9894228) 1.3836466
In model estimation, the situation is a bit more complex. When you fit a model to a dataset, the log likelihood will be evaluated at every observation. Some of these evaluations may turn out to be positive, and some may turn out to be negative. The sum of all of them is reported. Let me show you an example.
I will start by simulating a dataset appropriate for a linear model.
clear program drop _all set seed 1357 set obs 100 gen x1 = rnormal() gen x2 = rnormal() gen y = 2*x1 + 3*x2 +1 + .06*rnormal()
I will borrow the code for mynormal_lf from the book Maximum Likelihood Estimation with Stata (W. Gould, J. Pitblado, and B. Poi, 2010, Stata Press) in order to fit my model via maximum likelihood.
program mynormal_lf version 11.1 args lnf mu lnsigma quietly replace `lnf' = ln(normalden($ML_y1,`mu',exp(`lnsigma'))) end ml model lf mynormal_lf (y = x1 x2) (lnsigma:) ml max, nolog
The following table will be displayed:
. ml max, nolog Number of obs = 100 Wald chi2(2) = 456919.97 Log likelihood = 152.37127 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- eq1 | x1 | 1.995834 .005117 390.04 0.000 1.985805 2.005863 x2 | 3.014579 .0059332 508.08 0.000 3.00295 3.026208 _cons | .9990202 .0052961 188.63 0.000 .98864 1.0094 -------------+---------------------------------------------------------------- lnsigma | _cons | -2.942651 .0707107 -41.62 0.000 -3.081242 -2.804061 ------------------------------------------------------------------------------
We can see that the estimates are close enough to our original parameters, and also that the log likelihood is positive.
We can obtain the log likelihood for each observation by substituting the estimates in the log-likelihood formula:
. predict double xb . gen double lnf = ln(normalden(y, xb, exp([lnsigma]_b[_cons]))) . summ lnf, detail lnf ------------------------------------------------------------- Percentiles Smallest 1% -1.360689 -1.574499 5% -.0729971 -1.14688 10% .4198644 -.3653152 Obs 100 25% 1.327405 -.2917259 Sum of Wgt. 100 50% 1.868804 Mean 1.523713 Largest Std. Dev. .7287953 75% 1.995713 2.023528 90% 2.016385 2.023544 Variance .5311426 95% 2.021751 2.023676 Skewness -2.035996 99% 2.023691 2.023706 Kurtosis 7.114586 . di r(sum) 152.37127 . gen f = exp(lnf) . summ f, detail f ------------------------------------------------------------- Percentiles Smallest 1% .2623688 .2071112 5% .9296673 .3176263 10% 1.52623 .6939778 Obs 100 25% 3.771652 .7469733 Sum of Wgt. 100 50% 6.480548 Mean 5.448205 Largest Std. Dev. 2.266741 75% 7.357449 7.564968 90% 7.51112 7.56509 Variance 5.138117 95% 7.551539 7.566087 Skewness -.8968159 99% 7.566199 7.56631 Kurtosis 2.431257
We can see that some values for the log likelihood are negative, but most are positive, and that the sum is the value we already know. In the same way, most of the values of the likelihood are greater than one.
As an exercise, try the commands above with a bigger variance, say, 1. Now the density will be flatter, and there will be no values greater than one.
In short, if you have a positive log likelihood, there is nothing wrong with that, but if you check your dispersion parameters, you will find they are small.