Notes: Key concepts for hypothesis testing and case study 3

Notes: Key concepts for hypothesis testing and case study 3

An important part of data analysis is statistical hypothesis testing. We will introduce the formal hypothesis testing approach using the example of abattoir slaughter statistics in Indonesia.

The steps in hypothesis testing are:

  1. Develop the question of interest or objective of the analyses
  2. Establish null and alternative hypotheses
  3. Determine an appropriate statistical test
  4. Calculate a test statistic value
  5. Determine the region of test statistic values where you will reject or retain the null hypothesis
  6. Determine a probability of observing the test statistic if the null hypothesis is true
  7. Reject or retain the null hypothesis
  8. Make inferences about the population of interest i.e. answer your question of interest.

Each of these steps is discussed below.

Question of interest (objective)

One way to understand whether beef production self-sufficiency is increasing is to examine slaughter statistics. One could examine changes in the different categories of cattle that are slaughtered in Indonesia over time. If slaughter numbers in some categories such as female reproductive cattle or imported Australian cattle are declining, this may indicate that self-sufficiency is increasing. That is, less reliance is placed on non-sustainable sources of cattle. A formal hypothesis testing approach is required to address this idea.

Assuming that changing patterns of slaughter statistics can indicate increasing self-sufficiency, the research question is: are numbers of slaughtered cattle in different classes of cattle constant over time?

Establish the null and alternative hypotheses

It is always important to first establish the null hypothesis. The null hypothesis is always the hypothesis of no effect. The null hypothesis is rejected or retained based on the value of the test statistic and probability (or measure of association and confidence interval).

Null hypothesis = There is no effect of time (month) on slaughter numbers in different categories of slaughtered cattle.

The alternative hypothesis is generally opposite the null hypothesis. The alternative hypothesis is accepted if you have enough statistical evidence to reject the null hypothesis. Otherwise, if there is insufficient evidence to reject the null hypothesis, the null hypothesis is retained. Here the alternative hypothesis is:

HA = There is an effect of time (month) on slaughter numbers in different categories of slaughtered cattle.

Determine an appropriate statistical test

There are a large number of statistical tests available. Often several may be appropriate, but usually one particular test will be best. Please see the PowerPoint slide which test to use.ppt before reading further. This will give you lots of information on the issues to consider when choosing an appropriate statistical test.

Here we have a data set with predominantly categorical data. That is, cattle that are slaughtered are classed into categories of cattle type. Cattle are also classified into month of slaughter, and this data could be considered to be categorical. We have chosen a chi squared test which is appropriate for our data type and to test our null hypothesis.

Calculate the test statistic

Statistical tests rely on calculating a test statistic. We will demonstrate the calculation of a chi-squared test statistic. We will only use a portion of the total data from the case study (data from only 2 months and two classes of cattle). Then during exercises you will use the full dataset to calculate your test statistic.

The first step is to construct a contingency table which divides the data into different groups according to their type.

We have provided the following table which comes from iSIKHNAS data. This is a 2 by 2 contingency table of the number of Australian cattle or productive Indonesian cattle slaughtered in November and December in 2013.

Observed data

Month
Cattle type
2013-11
2013-12
Cattle from Australia
3962
6988
Productive Indonesian female cattle
654
958

Next add marginal (row and column) and grand totals to the contingency table.

Month
Row total
Cattle type
2013-11
2013-12
Cattle from Australia
3962
6988
10950
Productive Indonesian female cattle
654
958
1612
Column total
4616
7946
Grand total= 12562

The next step is to calculate expected values.

Expected values are the values we would expect if there was no effect in the contingency table, or in other words if the observed data was chi squared distributed. We normally expect small differences between expected values and observed values even in data where there is not a significant effect in the table. However, if the expected values are very different to the observed values, then this indicates an effect is present.

Expected values are calculated with the following formula:

{\mathrm  {Expected~}}{\mathrm  {value~}}{\mathrm  {for~a~}}{\mathrm  {cell}}={\frac  {{\mathrm  {row~}}{\mathrm  {total}}\ast {\mathrm  {column~}}{\mathrm  {total}}}{{\mathrm  {grand~}}{\mathrm  {total}}}}

This formula is applied to each data cell of the table (not the marginal or grand totals):

Expected values

Month
Row total
Cattle type
2013-11
2013-12
Australian cattle
{\frac  {10950\ast 4616}{12562}}=4024
{\frac  {10950\ast 7946}{12562}}=6926
{\mathrm  {10950}}
Local productive female cattle
{\frac  {1612\ast 4616}{12562}}=592
{\frac  {1612\ast 7946}{12562}}=1020
{\mathrm  {1612}}
Column total
{\mathrm  {4616}}
{\mathrm  {7946}}
{\mathrm  {Grand~total~=~12562}}

Now calculate chi squared statistics (\chi 2)

This is calculated with the following formula:

\chi 2=\sum {\frac  {{\left({\mathrm  {observed}}-{\mathrm  {expected}}\right)}^{{2}}}{{\mathrm  {expected}}}}

This formula focuses on determining how different the observed and expected values are from each other in each matching cell.

For example, for the first data cell (Australian cattle slaughtered in November 2013), apply the formula as follows.

Observed = 3962, expected = 4024

The \chi 2 value for this cell is:{\frac  {{\left({\mathrm  {observed}}-{\mathrm  {expected}}\right)}^{{2}}}{{\mathrm  {expected}}}}={\frac  {{\left(3962-4024\right)}^{{2}}}{4024}}=0.955

This is repeated for each data cell (not the marginal or grand totals) and the results are added (this is what the ∑ sign means, add everything).

Thus the total \chi 2 is: {\mathrm  {0.955+0.555+6.493+1.939=9.942}}

Determine the region of test values where you will reject or retain the null hypothesis

Assume you repeated data collection (many samples) a very large number of times when there was no effect present (i.e. null hypothesis is true). Each of the many test statistics calculated from each sample could be plotted in a probability distribution function. Sampling variability is the reason that all test results are not the same and that there is a distribution.

See Figure 7 where a chi-squared distribution is presented (along with the critical cut point value). This distribution corresponds to the chi-squared distribution for the small example we are using if the null hypothesis is true.

Based on this distribution, it is commonly accepted that we can set a critical test value that separates the distribution into two parts. The critical value is established so that 95% of test statistic values are less than the critical value when the null hypothesis is true. This only leaves 5% of values from the distribution that exceed this critical value when the null hypothesis is true.

Figure 7: Chi-squared probability distribution function for 1 degree of freedom. The red shaded region is the area of rejection of the null hypothesis (chi-squared values greater than 3.84). The value of 3.84 is the critical value.

Figure 7 Chi-squared probability distribution.png

These cut off points are commonly listed in text books or calculated by software and can be looked up or interpreted automatically for you by your software. Here the cut off value is \chi 2{\mathrm  {=3.84}}.

Determine a probability of observing the test statistic if the null hypothesis is true

We would then compare our estimated value (\chi 2=9.9) and see if our value exceeded the chi-squared critical cut point value of 3.84. It does.

In fact the probability of observing a chi-squared value as large as we did if the null hypothesis is true is extremely small (almost 0). In general, we always calculate a p value based on our test statistic. If this is less than 0.05, we would reject the null hypothesis.

Reject or retain the null hypothesis

This means we reject the null hypothesis and accept the alternative hypothesis very confidently. There is a relationship between cattle type and slaughter month (or an effect in the table).

Make inferences about the population of interest

We rejected the null hypothesis and selected the alternative hypothesis. We infer there are differences in the proportions of cattle classes slaughtered over time.

We can examine the differences between observed and expected values to determine the nature of the effect and make further (tentative) inferences.

It seems that in November, fewer Australian cattle and more productive female Indonesian cattle are slaughtered and that this reverses in December. This may be associated with the resumption of importation of Australian beef cattle into Indonesia. That is, with importations comes less pressure for slaughter on local Indonesian female cattle. Alternatively, our inference about the reason for the statistically significant difference could be false because we do not understand the production system perfectly. Perhaps this is a seasonal (annual) change?

This demonstrates that common sense and understanding of the system being investigated is important to interpret statistical test results. In addition, a biological understanding is often just as important as results from statistical tests. For example, in the observed results in the table above, the proportion of productive Indonesian female cattle fell from 14% one month to 12% the next month. Likewise, the proportion of Australian cattle slaughtered rose from 85% to 88%. These changes were statistically significant (partly due to the large sample size) even though they were quite small changes overall. You now have to weigh up the statistically significant test result against the size of the effect seen and decide how important you think the changes really are.

Errors in rejection/retention of hypotheses

Errors in rejection or retention of the null hypothesis are possible.

The first type of error can occur when we reject the null hypothesis when in fact the null hypothesis is actually true (an α error).

As we saw above, we set the critical cut point for rejection of the null hypothesis at the upper part of the chi-squared distribution (i.e. at the point where 95% of all chi-squared values are less than the critical cut point when the null is true). However, we know that for a chi-squared distribution that 5% of \chi 2 values will exceed the cut point even though the null hypothesis is true. If our test statistic is greater than this cut point then 95% of times we repeated the sample, this would be because the null hypothesis is false. However, in 5% of times the null hypothesis would be true. We may thus falsely reject the null hypothesis 5% of time. We are thus 95% confident in our conclusions with respect to rejecting the null hypothesis.

The second type of error is when we retain the null hypothesis even though it is not true (a β error). The way to reduce these types of errors is to increase the sample size as much as you can. The power of a study is how good the study is at avoiding these errors, that is how good it is at detecting an effect if it is really there.

We will now do exercises to demonstrate how to do statistical hypothesis testing. We will use the full cattle abattoir slaughter data rather than the subset we used above.