Using non-normal data in an improvement project could prove to be far from challenging when you are working a project with data that can be displayed as a normal distribution. With such data, many things can go well, but what if you are not so lucky? If you use Mintiab or many other statistical programs, you can either let the program find a distribution that better fits the data or a transformation that makes the data more normal. This is a case where just because you can does not make it your best choice! One of the best methods to effectively manage this issue dates back to the pre-computer era.
What to do if your data is not normally distributed…
1. Examine the process that generated the data.
1-a. Use theory, Subject Matter Experts, and experience to determine what it should be. Transform the data based on knowledge of the process.
1-b. Try the standards transformations –
* cycle time: take the natural log of the data.
* For percent or defect data near zero: take the square root of the data
* for yield data near 100%: convert to defect data and try again
1-c. Transformations expect the data to be > 0.0 (no negatives and no zeros) and to have small minimum values. You may need to shift the data by adding or subtracting a constant to place the minimums slightly larger than zero and then retry the transformations.
1-d. Consider that transformations only improve data skewed in shape. If the histogram shows it to be non-skewed, then you should look for a cause of the shape. Typically, they derive from data originating from more than one process or an unstable process.
2. If none of these methods provide acceptably normal data, then you are probably looking at a significant cause in your process. Find out why it is non-normal and work to stop the cause. This might be one of the following:
2-a. The data could be separated into unique groups with their own mean and variation, such as different equipment, operators, raw materials, sites, product types…… This data typically has a histogram that is bi-modal or multi-modal.
2-b. You have an unstable process where the mean and or the variance are not constant. This data typically has the shape of a flat top and short tails (Platykurtic).
2-c. You have most data in a normal distribution, but the tails are too long. There is a sharp center peak in the middle of the histogram and then long thin tails when compared to a normal distribution. This is caused by frequent “Special Causes” that cause extreme values and are quickly restored to normal (Leptokurtic).
In the beginning I said to use experience… well, that is what I just wrote: what experience has taught me.