An interesting fact about the Lognormal Distribution

A student asked me about describing the middle of a lognormal distribution, so here is the answer and a piece of trivia.

His choice was a geometric mean, which I have heard of but have not used.  To calculate a geometric mean, you multiply all the values in the sample together and then take the nth root (where n is the number of data points).  I am not sure where the best place is to use the geometric mean but it is just another method to estimate a middle value that is not significantly impacted by a high or extreme value.

The general method used to describe lognormal data is the actual mean of the data.  This is because lognormally distributed data is usually associated with duration of time data.  If the data is a duration of time, such as the processing time or the delay time then the impact on the company is the sum of all the time expended.  An example of this is work that has a person assigned to it, then each recorded time is equivalently the labor spent on the task.  In this case the average is the best statistic to represent the typical task time.  Why? because multiplying the average by the number of items will provide the total measured time, which the business paid for.

Yes, the mean of a skewed distribution, like a lognormal, can be effected by a single real high value.  If that real high value is an actual even that impacted the company, it should be included in the calculations.  Any use of a median or geometric mean will effectively hide the extreme values so that the process appears better than it is.

Now for an interesting fact about the lognormal distribution.

If you follow these steps, you will see the interesting fact.

  1. make a set of random normal numbers with a mean of 100 and a standard deviation of 5
  2. Create a new column of numbers by taking the natural log of each random value ln(x).
  3. Create a lognormal probability plot of the natural log data.
  4. take a look at the lognormal distribution parameters.  They should look familiar.

Yes, the lognormal distribution parameters are the mean and the standard deviation of a normal distribution that could have been used to make the lognormal data through a natural log transform.

Now you know a trivia fact to use with your nerdy friends.