Six Sigma Tool Box

Monday, December 11, 2017

The Slippery Measurement System Paradox

1. Background

This article is about the need to clarify how to proceed when Measurement System

Analysis (MSA for short) is necessary, parts from the process have to be measured and chosen

to run the MSA, and no other measurement system is available apart from the one that is

going to be evaluated.

Some authors and manuals set up criteria on how to select units from the process to

run the MSA. Some of them just say they have to cover most of the actual range, some specify

units have to represent at least 80% of current range of the process, some others establish

100% of either the range or the tolerance width, depending on the purpose of the

Measurement System. All these empirical rules have the same concept behind. Units/parts to

be selected, have to be representative of the process that is going to be measured. MSA

techniques such as Gage R&R compares uncertainty from Measurement System to the

variability of units themselves.

It has to be said that there is no good or bad measurement system, it is acceptable or

not acceptable to measure the characteristic we need to measure. All Measurement Systems

have uncertainty. The only thing that MSA techniques do is to determine if that uncertainty is

small enough to be dismissed in comparison to parts variation.

Doubts about if we are proceeding correctly then come up immediately. If we have to

select parts/units from the process in a way that they are going to be representative of it, and

we only have the Measurement System (MS for short) that is going to be evaluated, how do

we know that the parts are selected correctly? Moreover, and the most important question, if

the MS is not acceptable and we do not know it yet, is it possible that the

result of the analysis would tell us MS is acceptable and the reality would be that it is not at all?

Within the next lines of this text, we are going to explain that the above situation is

indeed a paradox, which means, it cannot happen. We are going to argue and demonstrate

that it is not possible to select parts with an unacceptable MS in a way that the MSA would tell

us to accept that MS.

2. The paradox argument

There are several mathematical expressions to compare MS uncertainty to

parts variation. The most used are:

The acceptance criteria is:

% Study Variation < 10%

% Contribution < 1%

% Tolerance < 10%

Number of distinct categories ≥ 14

Please note that acceptance criteria can vary among different uses and organizations.

Here we show typical values.

Let us look at equation (1) and imagine we are using a MS, which is not acceptable in

comparison to parts variation. We know that:

We can say that in an unacceptable MS, the variation due to the MS itself (Gage) would be big in

comparison to parts variation and thus, it would be a big proportion of the total as well. This

last statement is the key of what could not happen if we use a MS not acceptable. The range

measured, and therefore the variation, during parts selection process, would behave as

equation (5) shows. Therefore, the only thing that may happen is to select parts smaller than

the range of the process and not meeting what the practical rules recommend. That would

imply the variation of the Gage would be even a bigger proportion of the whole variation. The

conclusion driven by equations (1) to (4) and their respective acceptance criteria is that the MS

would be not acceptable as it actually is.

Based on previous argument, we can also state that MS cannot be accepted when it is

not acceptable as it is not a possible situation, thus a paradox.

3. Discussion and Conclusions

We have demonstrate in a simple way that a MS cannot be evaluated as good when it

has a big uncertainty when using Gage R&R techniques.

Main conclusion is that when we need to analyze and evaluate a MS and parts/units

have to selected, we can use the same MS for the selection.

The risk of evaluating a not acceptable MS as acceptable does not exist based on “the

Slippery Measurement System Paradox”.

Tuesday, December 6, 2016

How to decide the proper Beta Risk for our Process

Let’s start with a refresher…

Beta Risk in Measurement Systems is also known as Consumer’s Risk.

It is so called because Beta Risk is the proportion of Defectives Parts assessed by the measurement system as Ok Parts and therefore sent to the internal or external costumer. It is the proportion of defective parts not detected by our measurement system. In every Measurement System, there is always a Beta Risk greater than zero. But what is the proper one? It is always the typical 10% (sometimes 20%) mentioned in the technical literature?

The question is… Is Beta the actual Consumer’s Risk? Is Beta the proportion of NOK parts actually sent to the next step of the process or to the costumer?

Indeed it is not! Beta is not the proportion of NOK parts sent to the costumer out of the total. It is the proportion of NOK parts not detected by the Measurement System from the total NOK parts produced. So the consumer’s Risk depends on Beta, but also depends on the proportion of NOK parts produced by the process.

How to decide a proper Beta Risk

In the real world, Beta is always a Risk which implies a cost. A cost sometimes driven by

sample size, sometimes by technology reasons, …

The correct approach for Beta is taking in account not only Beta but also along with the proportion of NOK parts produced by the process, as follows:

Proportion of NOK parts sent to the costumer = Beta x p

Where:

- Beta: Number of Nok parts assessed as OK / Total number of Nok parts

- p: proportion of Nok parts produced by the process

NOTE: Beta and p are estimated parameters, therefore it is very important that a significant and representative sample size has to be taken to estimate those parameters.

From what has been said before we have to decide what is the Risk we are assuming of sending Nok parts to the costumer and then decide Beta and p that always mean cost.

Example

Imagine we are talking about a characteristic that is classified as Significant Characteristic by our costumer, so a Ppk of 1,33 (Long Term capability) is the maximum Risk (minimum value of Ppk) our costumer is allowing us to have for that characteristic.

That means the proportion of Nok parts would be 34 parts per million. Expressed as a proportion, it would be: 0,000034

Imagine our process is performing with a proportion of 1 Defect out of 1000 parts, which is p=0,001.

If we work out Beta using the formula from above: Beta = 0,000034/0,001=0,034 -> 3,4%

So in this example we would need our measurement system to have a Beta Risk of 3,4%, which is better (lower) than the “typical” value of 10%.

That would be the case if the process is running and it is not possible to improve the proportion of defective parts the process is producing, which was p. At least if it would be not possible to improve immediately, so we would have to protect our costumer with a Measurement System performing at least with a Beta of 3,4%. Once the costumer is protected, we can analyze the problem, find out the root causes, and improve it with no pressure.

Once the process is improved, imagine an improvement of 90% of defects, then at the end we would have an improved process with p=0,0001 (1 defect out of 10000 parts). Then the Beta allowed would be: Beta=0,000034 / 0,0001 = 34%.

Once the process is improved, we would not need so much effort on the measurement system anymore.

FINAL CONCLUSION:

Beta Risk depends on the actual Risk our Costumer is allowing us to assume and how our process is performing (process capability).

Saturday, December 3, 2016

What about Beta in Hypothesis testing?

The main purpose of this blog is to clarify Beta Risk concept and how it works and interpret its value. This post is intended for Black Belts so as to have clear and complete understanding of how to deal with hypothesis testing results in different situations.

Alpha Risk and its relationship with p-Value is better understood than Beta. Nevertheless, let’s see a refresher as Beta and Alpha are both working when using hypothesis testing methods.

Let’s put an example. In this first example we are going to reject the Null Hypotesis because p-value is less than Alpha risk.

Example:

We are considering changing suppliers for a part that we currently purchase from a supplier that charges us a premium for the hardening process.

The proposed new supplier has provided us with a sample of their product. They have stated that they can maintain a given characteristic of 5 on their product.

We want to test the samples and determine if their claim is accurate.

Statistical Problem:

H0: μN.S. = 5

Ha: μN.S. ≠ 5

Set Risk Levels and choose test:

1-Sample t Test (population Standard Deviation unknown, comparing a sample mean to a target).

α = 0.05 β = 0.10

We have a set of samples as shown below...

We are going to use 1 sample t-test to test our Null hypothesis: mean = 5 because we can use normal distribution and we do not know the historical standard deviation from this new supplier.

One-Sample T: Values

Test of μ = 5 vs ≠ 5

Variable N Mean StDev SE Mean 95% CI T P-Value

Values 9 4,7889 0,2472 0,0824 (4,5989; 4,9789) -2,56 0,034

Conclusion:

As the P-Value is less than our criteria Alpha, which was 0,05 (5%), we reject the Null. This means that by rejecting that the supplier has a mean of 5, we are accepting that the mean could be 5 with a probability of 0,034. This probability is considered low, as our criteria of what is low and high was 5% (Alpha). So we reject the Null hypothesis and say that there is enough statistical evidence to say that with a 95% of Confidence Level the new supplier is not performing with an average of 5.

As we reject the Null we do not care about Beta, which is the risk of being wrong when we fail to reject the Null Hypotesis.

The correct statement is: “we do not have enough statistical evidence to reject the Null hypothesis which was mean = 4,9, so have to accept it”.

In this case, there are two posibilities for what is truly happening. One is that they are different and due to the sample size we are not able to detect some amount of difference. We are saying that we simply have too much risk (Beta risk) for a certain difference.

The other possibility is that they are not different.

Let’s see how Beta works so to have a more clear view about what could be happening. For this pupose we have to use Beta as we are failing to reject. One could think that Beta works in a similar way as Alpha does. Then we could work out using Minitab Power and Beta, which is 1-power using the data we have. So the result using power and sample size would be: (see next result).

Power and Sample Size

1-Sample t Test

Testing mean = null (versus ≠ null)

Calculating power for mean = null + difference

α = 0,05 Assumed standard deviation = 0,2472

Sample

Difference Size Power

0,1111 24 0,559395

Beta = 1-0,559 = 0,441

By saying that Beta is 0,441, we are saying that if the populatin mean was the one from the sample which has a difference from the target of 0,1111 we would fail to reject that difference in the 44,1% of the cases.

This has nothing to do with the real problem as we simply do not know which is the population mean, we cannot say that the population mean is the one from the sample.

The way Beta has to be used in this case is as we have been provided with a certain sample we just decide which is our risk, the risk we assume to be reasonable, let’s put the tipical one, which is Beta=10%. In this case, using Minitab:

Power and Sample Size

1-Sample t Test

Testing mean = null (versus ≠ null)

Calculating power for mean = null + difference

α = 0,05 Assumed standard deviation = 0,2472

Sample

Size Power Difference

24 0,9 0,170852

Interpretation

The statement would be that with a sample sample of 24 and a standard deviation of 0,2472 the difference we can detect with a 10% of risk is 0,171.

Conclusion

If a difference of 0,171 and/or a risk of 10% is not enough for us, we had to put the new one in Minitab and take enough samples.

Imagine using the same example, that a difference of 0,1 is not allowed, so we want to ensure at least that difference with a 10% of risk (Beta risk). Therefore:

Power and Sample Size

1-Sample t Test

Testing mean = null (versus ≠ null)

Calculating power for mean = null + difference

α = 0,05 Assumed standard deviation = 0,2472

Sample Target

Difference Size Power Actual Power

0,1 67 0,9 0,903663

We had to have taken at least 67 samples to detect a difference of 0,1 with a 10% of risk.

Imagine that even a 10% of risk is too high for us due to the cost it implies, so only a 1% of risk is allowed.

With Beta=0,01 , Power=0,99

Power and Sample Size

1-Sample t Test

Testing mean = null (versus ≠ null)

Calculating power for mean = null + difference

α = 0,05 Assumed standard deviation = 0,2472

Sample Target

Difference Size Power Actual Power

0,1 115 0,99 0,990392

We would need a sample size of 115 to ensure a difference of 0,1 with a risk of 1%.

SUMMARY:

§From the sample we know the estimation for standard deviation and mean, so we can know the propability of having at least a certain value for the population parameter, for example the mean (this probalitility is p-value).

§Using the statement from above, the only thing we know about the population (thanks to a sample) is the probability distribution of the sample.

§Beta is the risk of not detecting a certain difference when the population changes. As we do not know what is the value of the population parameter, we cannot use the difference between the sample value and the target as certain to work out a “real” Beta. The sample value can only be used to work out a distribution which means we have infinite values of differences and infinite probabilities of that values, so infinite values for Beta as well.

§The way to use Beta is to decide which is the risk we could assume of not detecting a certain difference if there was indeed that difference. Then it gives us a sample size. Or the other way around, when we are provided with a certain sample, which is the difference and the risk we are assuming.

§Beta cannot be used to accept or reject the Null Hypothesis. It has to be used to decide what sample size to take or if we have enough sample size.