The topic of autocorrelation data analysis came up during my Lean Six Sigma Master Black Belt course last week and in two coaching sessions the prior week. Since this is an uncommon topic, it surprised me. I am using the blog to record the answers provided in class and in coaching sessions.
Autocorrelation Data Analysis: What isdata?
Autocorrelated data is found when sequential data values are correlated with each other. The most common autocorrelated data is the daily stock price for any publicly traded stock that you choose. Today’s price is correlated with yesterday’s price. This autocorrelation comes from the fact that todays price cannot change very much from the prior day’s value. Most stocks are trending up or down a bit, but the change from day-to-day is only slightly random. Another way to collect autocorrelated data would be to record your room temperature every 5 minutes. Since thermostats do not cycle but one or two times an hour, the room temperature is usually trending one way and then it jumps back when the HVAC kicks in. It could be heating up slowly and then cooled rapidly with the Air Conditioning or the opposite, where it is cooling slowly and then the room is heated up rapidly when the heaters turn on.
In the case of autocorrelated data you know more about each data point when you know the prior data value. The value of the current point is actually the combination of a random variation value, the prior data value, and any existing trending value.
The opposite of autocorrelated data is when the data is uncorrelated. For uncorrelated data, each data value represents a random selection from a population, or to say it acts like the data is generated from a random number generator.
Lean Six Sigma impact of Autocorrelated Data
The questions from my Master Black Belt course and from my two student coaching sessions all centered around control charting of autocorrelated data. It turns out that a control chart is very effected by autocorrelated data, because range or moving-range values are suppressed (smaller) because of the autocorrelation. Think about it, a control chart uses the range or moving-range average value to estimate the population standard deviation used to calculate the control limits.
Here is an I-chart of autocorrelated data (created by adding a random number to a sine function)
Autocorrelated data in an I-mR chart
In this chart you can see how the short term random variation is quite different from the long-term sinusoidal variation. Since the control limits are based on a moving range (MR-bar = 0.24), the process is out-of-control because there is a variation source outside of the moving range. The limits are based on the variation that exists outside of the autocorrelation.
Now look at the same data without the sine wave applied on the data. In this I-mR chart, the short term variation is the same as the earlier plot.
An I-mR chart with uncorrelated data
When you compare the two charts you can see that the MR-bar is nearly the same in both charts, 0.24. The second chart is in-control while the first chart is not. In the chart it is obvious, but lets look at it one more way.
Impact of Autocorrelated data on statistics
If we use the moving range average to estimate the standard deviation (r-bar/d2) you can see how the autocorrelated data underestimates the population standard deviation.
In this case, the r-bar estimates only the uncorrelated data standard deviation.
How to deal with uncorrelated data
There are a few method to deal with autocorrelated data. The most common is to move to a systematic sampling plan where you only record one data point every now and then, where the sample frequency is less frequent than the variation source that is driving the autocorrelation. In our example it would be a frequency that is less often than the period of the sine wave. This is easy for the simulated data because we can see the cycle.
But what if the frequency is not so clear? You can just try sampling with an increasingly less frequent plan and then test the data for randomness. The randomness can be performed with a runs test in Minitab. There is a more analytical method in Minitab by using the autocorrelation function. This tool will analyze a column of numbers and indicate what shift of the data will produce an uncorrelation.
Autocorrelated data is very difficult to deal with. Almost all of our statistical tools have an unwritten requirement of requiring to have independent and uncorrelated data. It is almost worth checking independence in every set of data.