Monday, August 6, 2018

Method for the estimation of Process Capability when sample size is less than 30

Summary

The use of capability indexes or metrics is vital in industry. It allows engineers to predict how the
process is going to behave in terms of the ability to meet costumer’s expectations. There are some rules to determine the appropriate sample size to estimate process capability with the highest precision that is possible. For instance, the most well-known rule is the one that estates the minimum sample size should be 30 individual measures. Sometimes, due to different reasons, such as lack of time, budget, etc., engineers cannot afford 30 units/measures for their capability studies.
Ignoring this fact and assuming that the traditional formulae for Cpk estimation is still applicable
when having less than 30 observations is a mistake. This document explains a method to estimate process capability with less than 30 measures without disregarding estimation uncertainty. It is developed using two different approaches. The first one presents the worst case, which is the lower bound, and the second one extends it to estimate the whole confidence interval, lower and upper bound for a given α.

1 Introduction and method fundamentals

The rule of 30 units as the minimum sample size for capability estimation is coming from one of
the consequences of the Central Limit Theorem (CLT), which is:


It means that if we have a random variable with a certain probability distribution and we estimate
its standard deviation, the standard deviation of the mean, when we take several observations
instead of individuals, will be the standard deviation of the individual observations divided by the
square root of the sample size (n).
If we draw the function of 1/√n in a chart the result is shown in the figure 1:



In figure one, we can see why the rule of 30 units as minimum is used. With 30 units the variability of the mean is almost converging and is quite far away from the “elbow” of the chart.

Nevertheless, it depends on the precision we want for our parameter estimation. One can argue
that until about 200 units we do not see how the asymptotic function is converging into an almost
parallel function to the horizontal axis. In general, we can also say that the improvement we get
from increasing sample size from 30 to 200 units is not so much if we compare it to the difference
when we are moving in the other direction, that is, when we decrease the sample size, since we
get close to the “elbow” of the chart very rapidly.
We can think that by fulfilling the rule of not estimating process capability with less than 30 units
is enough. But, in many cases, having 30 units is a not affordable luxury or it is just not possible for other reasons. If one remembers, the expression for Cpk estimation is:


Where USL and LSL are the Upper Specification Limit and Lower Specification Limit of the
characteristic we are estimating its capability.
If we look at the expression (1), we can conclude that not only the mean is playing a role in Cpk
estimation uncertainty. S is also an important parameter. Nevertheless, if we represent how σ
uncertainty is changing as n increases in a similar way than with the mean, the result is shown in
figure 2. In this case we are using Chi-Square to work out the upper bound of the confidence
interval for S. We will use the complete expression in the next section.




Looking at figure 2, we can conclude that the convergence of the estimation of sigma is achieved
before the one for the mean. That is the same of saying that the uncertainty of the mean is more
critical in terms of uncertainty due to sample size. This means, that we still have the same critical
value of 30 units as minimum, but we will use both confidence intervals, the one from the mean,
and the one for the standard deviation, to estimate Cpk.

2 Methodology

The reason why the traditional method is using the expression (1), which disregards the
uncertainty in the estimation of σ and μ, is that their estimation errors are considered negligible
when sample size is 30 or more. We have shown this idea in figures 1 and 2.
Then, the same logic tells us that when sample size is less than 30, we cannot consider that
uncertainties as negligible.
In the following lines, we are going to present the fundamentals to develop a method for the
estimation of Cpk when sample size is less than 30 individual measures.
When we do not know the standard deviation of the population, we have to use t-student
distribution for the estimation of confidence interval of the mean. The expression for this purpose
is the equation (2):

Nevertheless, we can use Z distribution if we consider the upper bound of the confidence interval
of σ (instead of S), which would be our worst case since the bigger the standard deviation, the
smaller the Cpk. We can see it in the opposite way, if we considered t-distribution in the
expression and the upper bound of σ, then, we would consider the uncertainty of not knowing
standard deviation of the population twice, which would not be correct.
Therefore, by using the following expressions (3) and (4) together, we will estimate upper bounds
(UB) and lower bounds (LB) for either the mean and the standard deviation from the sample.

As we have already justified before, we will only use the UB from the expression (4) as it is the
worst case. From the expression (3) we will get both the UB and LB for the mean.
For this method, we can use either the expression (2) or (3) to estimate the CI of the mean.
Although for n<30, the width of the CI from equation (3) is bigger than the one from (2) for the
same nominal coverage, both expressions will give us very similar results in practical terms.
Therefore, the estimation of Cpk when having less than 30 measures should be worked out using
the expression (5):




Cp index would be estimated using the same approach, but in that case, the estimation would be
much easier since the mean does not appear in Cp estimation. We could estimate potential
capability by using the expression (6):





2.1 Confidence Interval Estimation for Capability Indexes

If we look at the equation (5) it becomes obvious that this expression is indeed estimating the
lower bound of the Cpk, so we can write:


Selecting the minimum from the 4 options that are possible when the standard deviation is on the
upper bound, works well for CpkLB (taken from equation (4)) since it is giving us the lowest possible value for the closest specification limit without any contradiction with the actual capability definition from equation (1).
For the estimation of the upper bound, one can think of selecting the maximum when sigma is on
the lower bound which would give us the maximum distance. The problem is that by doing it, since it would give us the larger distance to the FARTHEST limit, it would clearly contradict Cpk definition and equation (1). So to estimate upper bound, we have to select the larger distance to the CLOSEST limit, and therefore not contradicting Cpk definition. The following expression could be an option to solve the problem for Cpk’s upper bound:



One can think in another possible expression, which would apparently give us the same result:



Equation (8) is not giving us the maximum value to the closest spec limit. Instead, it is giving us the minimum value to the farthest limit. It can be easily verified with an example, for instance, using ‘boobstrap’ techniques. Therefore, we use expression (9), since it is computing the maximum value to the closest spec limit, thus according to Cpk definition and equation (1).

In Figure 3, we can see how equation (9) is computing the correct value for CpkUB. In the case a), μ is close to the LSL and μUB-LSL is selected. In the case b), μ is close to USL and USL-μLB is selected.

Equations (7) and (9) can be further simplified since:




Then:













Therefore:








Confidence Interval (CI) for Cpk is built on the confidence level (CL) from two parameters, μ and σ.
Therefore, the confidence level (CL), 1-α, for Cpk will have the following expression:




Since k=2 (2 parameters):




Where:
- α’ is the significance level from the desired confidence level for Cpk’s interval.
- α is to compute each of the CI of μ and σ. It is the α from equations (2), (3) and (4).
Therefore, if we want to compute the CI for Cpk using the typical α of 0.05 (5%), the resulting α’
will be 0.0975 (9.75%). Note that 0.0975 is approximately 10% that is one of the values
recommended by R. Fisher for fiducial intervals and hypothesis tests, which has been widely used since then. On the other hand, if we want to use α’=0.05, then to compute CI for μ and σ, we have to obtain α from equation (12):




If α’=0.05, then:




This relationship can be also verified for the other recommended value by Fisher, 1%. From (12):




If α << 2, then 2-α≈2, therefore:




Expression (14) will be accomplished for most of the cases, where values recommended by R.
Fisher and usually applied are included (0.1, 0.05 and 0.01).

And for Cp we have to generalize expression (6) to compute both confidence intervals as follows:


3 Synthesis

To use confidence interval for capability indexes we have to use the following expressions:
- For Cpk:

Note that we could use equation (3) instead of (2) since they give us very similar results and there would not be any significant difference between the two in practical terms for most of the cases.

- For Cp:

Note that equation (17) is the synthesis of equations (15) and (16). Although a similar synthesis is
possible for Cpk, we have considered that for clarity reasons it is better to express them using two separate equations, which are the expressions (10) and (11).
Two different approaches are possible when dealing with n<30; one is to estimate confidence
intervals using all the expressions above listed, and the other one is to use only LB expressions for either Cpk or Cp, which are the worst possible cases for a given α.
With the confidence interval one can also decide if taking more samples could be worth it based
on the width of the confidence interval.

4 Application of the method

To facilitate the practical application of the method, an Excel file prepared by the author of this blog is available through sending a request to the following e-mail address:

rasanmar18@gmail.com

Monday, December 11, 2017

The Slippery Measurement System Paradox

The Slippery Measurement System Paradox

1. Background


   This article is about the need to clarify how to proceed when Measurement System
Analysis (MSA for short) is necessary, parts from the process have to be measured and chosen
to run the MSA, and no other measurement system is available apart from the one that is
going to be evaluated.

   Some authors and manuals set up criteria on how to select units from the process to
run the MSA. Some of them just say they have to cover most of the actual range, some specify
units have to represent at least 80% of current range of the process, some others establish
100% of either the range or the tolerance width, depending on the purpose of the
Measurement System. All these empirical rules have the same concept behind. Units/parts to
be selected, have to be representative of the process that is going to be measured. MSA
techniques such as Gage R&R compares uncertainty from Measurement System to the
variability of units themselves.

   It has to be said that there is no good or bad measurement system, it is acceptable or
not acceptable to measure the characteristic we need to measure. All Measurement Systems
have uncertainty. The only thing that MSA techniques do is to determine if that uncertainty is
small enough to be dismissed in comparison to parts variation.

   Doubts about if we are proceeding correctly then come up immediately. If we have to
select parts/units from the process in a way that they are going to be representative of it, and
we only have the Measurement System (MS for short) that is going to be evaluated, how do
we know that the parts are selected correctly? Moreover, and the most important question, if
the MS is not acceptable and we do not know it yet, is it possible that the
result of the analysis would tell us MS is acceptable and the reality would be that it is not at all?

   Within the next lines of this text, we are going to explain that the above situation is
indeed a paradox, which means, it cannot happen. We are going to argue and demonstrate
that it is not possible to select parts with an unacceptable MS in a way that the MSA would tell
us to accept that MS.

2. The paradox argument

There are several mathematical expressions to compare MS uncertainty to
parts variation. The most used are:











The acceptance criteria is:

% Study Variation < 10%
% Contribution < 1%
% Tolerance < 10%
Number of distinct categories ≥ 14

   Please note that acceptance criteria can vary among different uses and organizations.
Here we show typical values.

   Let us look at equation (1) and imagine we are using a MS, which is not acceptable in
comparison to parts variation. We know that:




   We can say that in an unacceptable MS, the variation due to the MS itself (Gage) would be big in
comparison to parts variation and thus, it would be a big proportion of the total as well. This
last statement is the key of what could not happen if we use a MS not acceptable. The range
measured, and therefore the variation, during parts selection process, would behave as
equation (5) shows. Therefore, the only thing that may happen is to select parts smaller than
the range of the process and not meeting what the practical rules recommend. That would
imply the variation of the Gage would be even a bigger proportion of the whole variation. The
conclusion driven by equations (1) to (4) and their respective acceptance criteria is that the MS
would be not acceptable as it actually is.

   Based on previous argument, we can also state that MS cannot be accepted when it is
not acceptable as it is not a possible situation, thus a paradox.

3. Discussion and Conclusions

   We have demonstrate in a simple way that a MS cannot be evaluated as good when it
has a big uncertainty when using Gage R&R techniques.

   Main conclusion is that when we need to analyze and evaluate a MS and parts/units
have to selected, we can use the same MS for the selection.

   The risk of evaluating a not acceptable MS as acceptable does not exist based on “the
Slippery Measurement System Paradox”.

Tuesday, December 6, 2016

How to decide the proper Beta Risk for our Process

How to decide the proper Beta Risk for our Process

Let’s start with a refresher…

Beta Risk in Measurement Systems is also known as Consumer’s Risk.

It is so called because Beta Risk is the proportion of Defectives Parts assessed by the measurement system as Ok Parts and therefore sent to the internal or external costumer. It is the proportion of defective parts not detected by our measurement system. In every Measurement System, there is always a Beta Risk greater than zero. But what is the proper one? It is always the typical 10% (sometimes 20%) mentioned in the technical literature?
The question isIs Beta the actual Consumer’s Risk? Is Beta the proportion of NOK parts actually sent to the next step of the process or to the costumer?

Indeed it is not! Beta is not the proportion of NOK parts sent to the costumer out of the total. It is the proportion of NOK parts not detected by the Measurement System from the total NOK parts produced. So the consumer’s Risk depends on Beta, but also depends on the proportion of NOK parts produced by the process.

How to decide a proper Beta Risk

In the real world, Beta is always a Risk which implies a cost. A cost sometimes driven by
sample size, sometimes by technology reasons, …

The correct approach for Beta is taking in account not only Beta but also  along with the proportion of NOK parts produced by the process, as follows:

Proportion of NOK parts sent to the costumer = Beta x p

Where:
- Beta: Number of Nok parts assessed as OK / Total number of Nok parts
- p: proportion of Nok parts produced by the process
NOTE: Beta and p are estimated parameters, therefore it is very important  that a significant and representative sample size has to be taken to estimate those parameters.

From what has been said before we have to decide what is the Risk we are assuming of sending Nok parts to the costumer and then decide Beta and p that always mean cost.

Example
Imagine we are talking  about a characteristic that is classified as Significant Characteristic by our costumer, so a Ppk of 1,33 (Long Term capability) is the maximum Risk (minimum value of Ppk) our costumer is allowing us to have for that characteristic.

That means the proportion of Nok parts would be 34 parts per million. Expressed as a proportion, it would be: 0,000034

Imagine our process is performing with a proportion of 1 Defect out of 1000 parts, which is p=0,001.

If we work out Beta using the formula from above: Beta = 0,000034/0,001=0,034 -> 3,4%

So in this example we would need our measurement system to have a Beta Risk of 3,4%, which is better (lower) than the “typical” value of 10%.

That would be the case if the process is running and it is not possible to improve the proportion of defective parts the process is producing, which was p. At least if it would be not possible to improve immediately, so we would have to protect our costumer with a Measurement System performing at least with a Beta of 3,4%. Once the costumer is protected, we can analyze the problem, find out the root causes, and improve it with no pressure.

Once the process is improved, imagine an improvement of 90% of defects, then at the end we would have an improved process with p=0,0001 (1 defect out of 10000 parts). Then the Beta allowed would be: Beta=0,000034 / 0,0001 = 34%.

Once the process is improved, we would not need so much effort on the measurement system anymore.

FINAL CONCLUSION:
Beta Risk depends on the actual Risk our Costumer is allowing us to assume and how our process is performing (process capability).

Saturday, December 3, 2016

What about Beta in Hypothesis testing?

What about Beta in Hypothesis testing?

The main purpose of this blog is to clarify Beta Risk concept and how it works and interpret its value. This post is intended for Black Belts so as to have clear and complete understanding of how to deal with hypothesis testing results in different situations.

Alpha Risk and its relationship with p-Value is better understood than Beta. Nevertheless, let’s see a refresher as Beta and Alpha are both working when using hypothesis testing methods.
Let’s put an example. In this first example we are going to reject the Null Hypotesis because p-value is less than Alpha risk.

Example:

We are considering changing suppliers for a part that we currently purchase from a supplier that charges us a premium for the hardening process. 
The proposed new supplier has provided us with a sample of their product.  They have stated that they can maintain a given characteristic of 5 on their product.
We want to test the samples and determine if their claim is accurate.
Statistical Problem:
H0: μN.S. = 5
Ha: μN.S. 5
Set Risk Levels and choose test:
  1-Sample t Test (population Standard Deviation unknown, comparing a sample mean to a target).
  α = 0.05       β = 0.10

We have a set of samples as shown below...
We are going to use 1 sample t-test to test our Null hypothesis: mean = 5 because we can use normal distribution and we do not know the historical standard deviation from this new supplier.

One-Sample T: Values
Test of μ = 5 vs ≠ 5
Variable  N    Mean   StDev  SE Mean       95% CI                    T    P-Value
Values    9  4,7889  0,2472   0,0824  (4,5989; 4,9789)  -2,56  0,034

Conclusion:
As the P-Value is less than our criteria Alpha, which was 0,05 (5%), we reject the Null. This means that by rejecting that the supplier has a mean of 5, we are accepting that the mean could be 5 with a probability of 0,034. This probability is considered low, as our criteria of what is low and high was 5% (Alpha). So we reject the Null hypothesis and say that there is enough statistical evidence to say that with a 95% of Confidence Level the new supplier is not performing with an average of 5.

As we reject the Null we do not care about Beta, which is the risk of being wrong when we fail to reject the Null Hypotesis.

The correct statement is: “we do not have enough statistical evidence to reject the Null hypothesis which was mean = 4,9, so have to accept it”.
In this case, there are two posibilities for what is truly happening. One is that they are different and due to the sample size we are not able to detect some amount of difference. We are saying that we simply have too much risk (Beta risk) for a certain difference.
The other possibility is that they are not different.
Let’s see how Beta works so to have a more clear view about what could be happening. For this pupose we have to use Beta as we are failing to reject. One could think that Beta works in a similar way as Alpha does. Then we could work out using Minitab Power and Beta, which is 1-power using the data we have. So the result using power and sample size would be: (see next result).

Power and Sample Size
1-Sample t Test
Testing mean = null (versus ≠ null)
Calculating power for mean = null + difference
α = 0,05  Assumed standard deviation = 0,2472
            Sample
Difference    Size     Power
    0,1111      24  0,559395
Beta = 1-0,559 = 0,441
By saying that Beta is 0,441, we are saying that if the populatin mean was the one from the sample which has a difference from the target of 0,1111 we would fail to reject that difference in the 44,1% of the cases.
This has nothing to do with the real problem as we simply do not know which is the population mean, we cannot say that the population mean is the one from the sample.

The way Beta has to be used in this case is as we have been provided with a certain sample we just decide which is our risk, the risk we assume to be reasonable, let’s put the tipical one, which is Beta=10%. In this case, using Minitab:

Power and Sample Size
1-Sample t Test
Testing mean = null (versus ≠ null)
Calculating power for mean = null + difference
α = 0,05  Assumed standard deviation = 0,2472
Sample
  Size  Power  Difference
    24    0,9    0,170852

Interpretation
The statement would be that with a sample sample of 24 and a standard deviation of 0,2472 the difference we can detect with a 10% of risk is 0,171.

Conclusion
If a difference of 0,171 and/or a risk of 10% is not enough for us, we had to put the new one in Minitab and take enough samples.

Imagine using the same example, that a difference of 0,1 is not allowed, so we want to ensure at least that difference with a 10% of risk (Beta risk). Therefore:

Power and Sample Size
1-Sample t Test
Testing mean = null (versus ≠ null)
Calculating power for mean = null + difference
α = 0,05  Assumed standard deviation = 0,2472
            Sample  Target
Difference    Size   Power  Actual Power
       0,1          67     0,9      0,903663
We had to have taken at least 67 samples to detect a difference of 0,1 with a 10% of risk.
Imagine that even a 10% of risk is too high for us due to the cost it implies, so only a 1% of risk is allowed.

With Beta=0,01 , Power=0,99
Power and Sample Size
1-Sample t Test
Testing mean = null (versus ≠ null)
Calculating power for mean = null + difference
α = 0,05  Assumed standard deviation = 0,2472
            Sample  Target
Difference    Size   Power  Actual Power
       0,1        115    0,99      0,990392
We would need a sample size of 115 to ensure a difference of 0,1 with a risk of 1%.

SUMMARY:

§From the sample we know the estimation for standard deviation and mean, so we can know the propability of having at least a certain value for the population parameter, for example the mean (this probalitility is p-value).
§Using the statement from above, the only thing we know about the population (thanks to a sample) is the probability distribution of the sample.
§Beta is the risk of not detecting a certain difference when the population changes. As we do not know what is the value of the population parameter, we cannot use the difference between the sample value and the target as certain to work out a “real” Beta. The sample value can only be used to work out a distribution which means we have infinite values of differences and infinite probabilities of that values, so infinite values for Beta as well.
§The way to use Beta is to decide which is the risk we could assume of not detecting a certain difference if there was indeed that difference. Then it gives us a sample size. Or the other way around, when we are provided with a certain sample, which is the difference and the risk we are assuming.
§Beta cannot be used to accept or reject the Null Hypothesis. It has to be used to decide what sample size to take or if we have enough sample size.