In all of my Lean Six Sigma classes we talk about the quality of the data, and how we often find it lacking.
Today I found another blog post that includes a detailed analysis of a biased data source. It is the comparison of different movie rating groups that aggregate all of the online reviews to provide a score.
The posting “Be Suspicious Of Online Movie Ratings, Especially Fandango’s” by Walt Hickey on the five thirty eight blog, shows a comparison between Fandango ratings and almost every other rating site.
Observed Movie Biases
- On Fandango, no move was rated below a 3.0 stars out of 5. All other rating services have movies rated as low as a 1 star. The author proposes that this is because Fandango is owned by a movie studio and is in the business of selling movie tickets. This means it has a stake in your choosing a movie on its site.
- Fandango only rounds up to the next highest half point. A 4.1 rounds to 4.5. This is not how rounding is supposed to work. Most expect rounding to be towards the nearest half point, where a 4.1 would become a 4.0 start movie.
What does this mean to a Lean Six Sigma practitioner?
In my personal experience, the quality of many data sources is not what you think or assume that it should be. I have found so many issues with data that I now perform a “Walk the Data” analysis on every project. I record an event or action and then follow how that value is recorded, transcribed, and modified on its way to the data set that I have received for my improvement project. This is no different from a practitioner “Walking the Line” (a gemba walk for you lean folks) to observe what is actually being performed.
Things that I have experienced:
- A client who had an ERP software that generated factory work orders to fulfill a customer order. The product engineer would determine the amount of product needed to provide the customer need and then would add a buffer to account for potential yield loss. This is a common practice to ensure you do not provide a short order to a customer. What no one in the organization knew was that the ERP software also added an overage to every order to account for common yield loss. No wonder the company had a project to reduce the cost to produce their products.
- I worked a project to reduce the late completions of a complex process. The work crews were running weeks late on an expected three week project. We knew there were a lot of work stoppages being recorded, so I started the project by collecting all the data on the recorded work stoppages and found a few process changes that would reduce the occurrence of a few of the stoppage causes. As I was wrapping up the project, I noticed that I had not “walked the line” and truly observed the process being executed, so off I went for a half-day observation. The next day I checked the logs and found that two of the work stoppages that I observed were not in the logs. Upon questioning the work team leader, I was told that those two events were work delays, not work stoppages. I then found out that a planned stoppage was a delay, and unplanned stoppage was a stoppage. Of course, no one tracked the delays. I had to redo the entire project, including the delay data. When the site leadership found out that the operations group was not reporting these work delays, they now understood why everything was late.
- Many projects are about reducing the lead time of a process. In many of these processes, the lead time is reported in integer day values. If you receive lead time data, in integer days, what does a three day lead time mean?
- If the system just subtracts two date values, then lead time is between 2 and 4 days. (end of day 1 & finish the beginning of day 4 = 2 days) & (beginning of day 1 & finish the end of day 4 = 4 days)
- If the system subtracts two date/time values you will obtain a lead time in days & hours. This is much better than integer days, but how did the value account for non-work time such as weekends. Yes, the customer lead time should include weekend time, but internal planning lead times should only include scheduled workdays. I have found both cases used without the company’s knowing how weekends were counted.
- If the system just subtracts two date values, then lead time is between 2 and 4 days. (end of day 1 & finish the beginning of day 4 = 2 days) & (beginning of day 1 & finish the end of day 4 = 4 days)
- Workforce efficiency reporting has routine biases. We all assume that the efficiency is reported as the value add work hours over the scheduled work hours. But I have seen many organizations that do not use the scheduled hours as the denominator. One place only counted hours available for work as the denominator. This group removed hours for meetings, shift turnovers, delays caused by other groups, or any other non-work time that could not be assigned to their group. Ignoring any non-work time that was from other than internal causes gave the group a great efficiency compared to other groups. The leadership had no idea that the managers were not using the standard calculation
- I shared with one of my employers an experience from a book written by a speaker I brought into one of my employers, Mark Graham Brown. When he was tasked by Kentucky Fried Chicken to understand why one location maintained such a high “Chicken Efficiency” metric when compared to the rest of the franchises. Chicken Efficiency was calculated by dividing the chicken sold by the chicken cooked. It was a corporate metric to track food losses. When he visited the site with the extraordinary Chicken Efficiency, he found that the employees quit cooking chicken to keep in queue after the dinner rush ended. For the next three hours they would only cook to order. During the visit he also saw that over 50% of the customers walked out without a purchase when they heard that it would be 12 minutes before their order would be ready.
I expect that every Lean Six Sigma practitioner has stories like these. We just quit assuming that business provided data that is exactly what we are told it represents. But what does this mean to our efforts to create dashboards and scorecards?
Metric Bias Impact on Scorecards
If we routinely find data biasing, data manipulation, data errors and much more when we obtain data from a business, what would we expect of the quality of the performance scorecards? I expect that the scorecard data quality is equally questionable; questionable enough that the leadership may not want to act on signals in the data. Realize that most business managers and executives moved up to these positions because they could show that their performance was better than their peers, based on data which they themselves may have manipulated.
Maybe dashboards and scorecards are not that important to business leaders because they do not trust the data quality, no matter how it is shown.
Terrific blog Rick!