Comparing yields when they are good (Chi-square test)
December 19th, 2007A student called about the proper test to evaluate the differences in yields between eight different products. They told me nothing works.
The yield was at an electrical interconnection where each product had a different number per item. The number of opportunities varied to a level that could not be considered as equal or even close.
The hypothesis test roadmap was followed and it leads to a chi-square test or analysis of means. But they did not work.
ANOM could not be used due to the varying number of opportunities.
Upon questioning, the problem ended up being that the yields were too good as in 100% on a couple of the products. You can not make attribute comparisons without having defects. Otherwise there is no information to compare, so where do you go?
First, the chi-square test for tables (in Minitab) or Chi-square test for independence (from statistics books) has a data requirement before it is works well. This is that every cell in the table needs to have a predicted value of 5 or more. This usually means that each cell in the table needs to have a value of 5 or more. If you do not have that what can you do?
The standard answer is to combine categories with small counts until you have a single category with more than 5 counts, sort of like what a Pareto charting program will do with the last 5% or so of choices. This is difficult if there is not physical meaning to the grouping. It is also does not work if all the data has zeros. The real risk in low table counts is that the Ch-Square test will under estimate the true p-value (declare significance before it should). If the counts are less than 5 and the p-value is high, there is no real worry.
What can you do? There is no clear hypothesis test method, but a few can be misused.
How about doing a whole series of two proportion tests? If there are eight categories, that means that there would be 28 paired comparisons. [n*(n-1)/2] You can do this but you need to set the confidence level of each test at 99.82% so that the overall confidence is 95%. (.9982^28 = .95) This is the same concept that multiple comparisons are performed as part of ANOVA.
One possible way to compare the groups is to generate a confidence interval for each product and comparing them graphically. It would show differences, but would not be a hypothesis test and it would probably be adequate to make the proper business decisions.
|
|
|