Advanced Field Epi:Manual 1 - Disease Investigation/en

Halaman ini adalah sebuah versi terjemahan dari halaman Advanced Field Epi:Manual 1 - Disease Investigation dan terjemahannya telah selesai 100% dari sumber terkini.

Bahasa lain:

Daftar isi

Disease Investigation

The Basic Field Epidemiology Manual provides information on the clinical examination, making a diagnosis, and the initial epidemiologic approach to disease investigation.

This section provides more advanced material to assist in the planning and implementation of epidemiologic investigations of disease and in the analysis of data collected from investigations. It is assumed that people reading this material will have read through the material above presented in the Basic Field Epidemiology Manual.

Measures of disease frequency

Counts of cases, non-cases, and population at risk can be used to estimate epidemiological measures of disease frequency. This allows you to describe disease events and compare disease events over time. The most used measures are prevalence and incidence.


The prevalence of a disease (or condition) is the proportion of cases in a population at a given point in time.

Budi looks after 15 cows; he has contacted you because 5 animals are sick with the same signs of disease. Soleh looks after 30 cows in the same area as Budi's cows and none of Soleh's cows are sick. All the cows all graze together in one area.

The prevalence of disease is

P={\frac  {5}{15+30}} = 0.11 = 11%

Prevalence = {\frac  {{\mathit  {number}}{\mathit  {of}}{\mathit  {cases}}}{{\mathit  {population}}{\mathit  {at}}{\mathit  {risk}}}}

The simplest estimate of the population at risk is the sum of cases and non-cases (all animals) on the farm or farms where the investigation is taking place.

Prevalence may not always provide a good measure of the occurrence of new cases of a disease. An example of this is when cases are based on results of a serology tests measuring antibodies, a marker of prior exposure. In this situation a positive test may indicate prior infection from months or years previously and may not have anything to do with the current disease.

In order to understand whether prevalence may reflect recent disease cases or a mixture of old and recent cases, you will need to understand the nature of the disease being investigated, how animals respond to being infected and the type of diagnostic test that has been used to classify animals as cases or non-cases.


The incidence is the number of new cases that arise in a population over a specified period of time. Unlike prevalence, incidence reflects risk, or the likelihood of an individual animal contracting the disease in a given period of time.

Incidence can be calculated in different ways:

  • cumulative incidence (CI) or incidence risk
  • incidence rate (IR) (or incidence density)

Cumulative incidence is the number of animals that develop the disease in a defined period of time divided by the number of healthy animals at risk at the beginning or start of the period of time. All incidence measures should be based only on new cases of disease that occur in the time period of interest.

Budi watches the 45 cows over the next 7 days. He calls and says that 8 more cows have become sick. The animals at risk are those that are disease free at the start of a period. At the start of the 7 day period there are 45 cows but 5 have already been diagnosed with the disease so this leaves 40 that were disease free at the start of the7 day window.

{\mathit  {CI}}={\frac  {8}{40\left(7{\mathit  {day}}{\mathit  {period}}\right)}} = 0.20= 20%

Estimation of incidence is easiest when the population of animals does not change over time (no new animals arrive and no animals move away). This is called a closed population. Often the population may change over time because new animals arrive or some animals may be sold or moved away. This is called an open population or dynamic population.

If animals are lost to follow-up during the period of interest, then the denominator should be adjusted to take this into account. The same principle applies if any new animals enter the population at risk during the time period. There are several commonly used methods for adjusting the count of population at risk in open populations. The two approaches most commonly used for cumulative incidence estimation are described below:

  1. Number of disease free animals at the start of the follow-up period for closed populations (simplest approach but may be biased if there is a lot of animal movement).
  2. Population size at mid-point of the follow-up period for open populations which may be estimated as

{\mathit  {Average}}{\mathit  {number}}{\mathit  {at}}{\mathit  {risk}}={N}_{{{\mathit  {Start}}}}+{\frac  {1}{2}}{N}_{{{\mathit  {New}}}}-{\frac  {1}{2}}\left({N}_{{{\mathit  {Lost}}}}+{N}_{{{\mathit  {Cases}}}}\right)

{\mathit  {Cumulative}}{\mathit  {Incidence}}={\frac  {{\mathit  {Number}}{\mathit  {of}}{\mathit  {new}}{\mathit  {diseased}}{\mathit  {animal}}\in {\mathit  {the}}{\mathit  {time}}{\mathit  {period}}}{{\mathit  {Average}}{\mathit  {number}}{\mathit  {of}}{\mathit  {animals}}{\mathit  {at}}{\mathit  {risk}}{\mathit  {at}}{\mathit  {the}}{\mathit  {mid}}{\mathit  {point}}{\mathit  {of}}{\mathit  {the}}{\mathit  {period}}}}

At the start of the year there were 1000 cows in the population of interest, all were disease free. A total of 400 cows were sold half way through the year. During the year 20 cows died from anthrax.

NStart = 1000

NNew = 0

NLost = 400

NCases = 20

Number at risk = 1000 - 0.5*400 - 0.5*20 = 790

Cumulative incidence = 20/790 = 0.025 = 2.5%

Incidence rate is the number of new cases of disease that occur per unit of animal-time at risk during the defined follow-up period. Incidence rate requires more detailed estimation of animal-time at risk compared to cumulative incidence.

For closed populations (no movement of animals in or out), the denominator is the number of disease free animals at the start multiplied by the length of the follow up period (in days, weeks, months or even years).

For open populations (animal movement in or out) we need to adjust the denominator to take movements into account. Again the simplest way to do this is often to add half the number of animals that were added to the population at risk during the period and subtract half the number that left the population at risk, using the same approach as method 2 described above for cumulative incidence. The adjusted number is then multiplied by the length of the follow-up period.

If there is detailed follow-up data on individual animals, we can generate the denominator animal-time at risk by calculating the exact time at risk for each individual animal and summing these.

Example of incidence rate using approximate count for PAR

At the start of the year there were 20 cows in the population of interest that were all disease free. A total of 3 cows were sold half way through the 12 month period. During the year 2 cows died from haemorrhagic septicaemia.

NStart = 20

NNew = 0

NLost = 3

NCases = 2

Number at risk = 20 - 0.5*3 - 0.5*2 = 17.5

Time period = 12 months

Animal time at risk = 17.5*12 = 210 animal months

Incidence rate = 2/210 = 0.0095 cases per animal month

We can change the units of the animal-time at risk to 100 animal-months

0.0095 cases per animal-month = 0.95 cases per 100 animal-months

Example of incidence rate using exact estimate of PAR

Assume 4 healthy cows were present on the farm at the start of the period and they were followed for 30 days.

1 cow was not sick at all throughout the 30 days = 1 animal month at risk

1 animal got sick on day 10 = 0.33 animal months at risk

1 animal got sick on day 20 = 0.67 animal months at risk

1 animal was sold on day 15 = 0.5 animal months at risk

Total animal time at risk = 2.5 animal months

Total new cases = 2

Incidence rate = 2/2.5 = 0.8 cases per animal month

Some Important issues to remember:

  • Incidence is a dynamic measure of disease whereas prevalence is only a static measure of disease
  • Incidence and prevalence are related. The prevalence of disease in a population-at-risk reflects both the incidence of new cases of disease and the duration of disease in individual cases: Prevalence = Incidence x Duration under certain conditions.
  • Changes in the incidence or the duration of a disease will change the prevalence. The incidence rate is usually greater than prevalence if the disease is short in duration and/or fatal. Prevalence is usually greater than the incidence if the disease is chronic in nature.
  • Cumulative incidence (CI) rate provides a direct estimate of the likelihood of an animal experiencing the event of interest during the time period. CI has a meaning on an individual basis as well as on a population basis.
  • Counting the denominator (animal time at risk) for incidence estimates can be problematic particularly if animals enter or leave the population during the time of interest. There are a number of ways to deal with this problem.

Table 2.: Comparison of main features of prevalence incidence rate and cumulative incidence

Point prevalence
Period prevalence
Incidence rate
Cumulative incidence
All cases counted at a single occasion in time Cases present at period start + any new cases during period New cases during follow-up period New cases during follow-up period
All individuals examined All individuals examined Sum of time at risk for susceptible animals present at start of period All susceptible individuals present at start of follow-up period
Single point in time Defined follow up period Measured for each individual from start to end of period, until disease occurs or until animal exits the population Defined follow up period
Study type
Cross-sectional Cohort Cohort Cohort
Probability of disease at a point in time Probability of having disease over a defined period How quickly new cases develop over a defined follow up period Probability of developing disease over a defined period

Attack rate

Attack rate (or called attack risk) is a specific type of incidence estimate (either a cumulative incidence or an incidence rate) which applies to outbreaks or situations where the period of observation is relatively short and where the population at risk is tightly defined eg the number of animals on the farm under investigation.

An attack rate is the number of cases of the disease divided by the number of animals at risk at the beginning of the outbreak (the outbreak covers a defined time interval).

Attack rate = {\frac  {{\mathit  {number}}{\mathit  {of}}{\mathit  {animals}}{\mathit  {affected}}}{{\mathit  {number}}{\mathit  {of}}{\mathit  {animals}}{\mathit  {exposed}}}}

For example the attack rate can be used to measure mortality due to yellow head virus infection in prawns. If over a 4 day period 3500 of the 5000 prawns in a pond die the attack rate is 3500/5000 = 0.7 or 70%.

Data analyses to describe patterns of disease

It is assumed that the initial clinical examination and case definition have been completed and that data have been collected on cases and non-cases on those farms where diseased animals are located.

Confirm the outbreak

Often the major reason that an epidemiologic investigation is begun into a particular disease is because there is concern that there are more cases of disease than expected. Possible reasons may be an outbreak of a new disease, an exotic disease or some change in an endemic disease that has made it more infectious or capable of causing different disease signs.

As soon as a case definition is produced and counts are made of cases and non-cases it is important to produce some simple estimates of frequency (prevalence or incidence) to confirm that there is an increase in disease cases above what is expected and that further investigation is warranted.

Temporal patterns

Variation in the frequency of occurrence of cases of a disease over time is called its temporal pattern. There are three basic time spans that may be used to describe temporal patterns:

  • an epidemic period, which is the time the start of a disease outbreak to the end of the outbreak (may vary from days to weeks or months or longer);
  • a 12 month period to describe seasonal patterns; and
  • a long period of many years to identify long-term trends.

The simplest temporal pattern for disease cases is an epidemic curve. The epidemic curve is a graph plotting the number of cases of the disease on the vertical axis against the time of onset of each case, either as a bar graph or frequency polygon. The first case identified for a particular outbreak is referred to as the index case. For infectious diseases, identifying the index case is important as information about the index case can be valuable in ascertaining the source of the outbreak and the incubation period.

An epidemic curve is a graph of the number of cases of disease against the time of onset of each case

In general, an epidemic curve has four distinct components and in some cases there may be a secondary occurrence of additional cases (a 5th component). These are displayed in the following figure.

Diagrammatic representation of the components of an epidemic curve.

1= endemic level, 2=ascending branch of epidemic, 3=peak of epidemic, 4=descending branch of epidemic, 5=secondary peak.

Components of an epidemic curve

The slope of the ascending branch can indicate the type of exposure (propagating or common source) or the mode of transmission and incubation period of the disease agent. If transmission is rapid and the incubation period short, then the ascending branch will be steeper than if transmission is slow or if the incubation period is long.

A point-source epidemic is one where all animals (units) are exposed to the source of disease (agent or toxin) over a very short period of time, resulting in a very steep ascending branch of the epidemic curve.

A propagating epidemic is one where transmission occurs among individuals in the population, so that the ascending branch ascends more gradually.

The length of the plateau and slope of the descending branch are related to the availability of susceptible animals which in turn depends on many factors such as stocking densities, introductions into the population, the changing importance of different mechanisms of transmission and the proportion of immunes in the population at risk.

Secondary peaks are usually due to the introduction of new susceptible animals in to the diseased area, spread of the disease into a new area containing susceptible animals, or a change in the mode of transmission.

The interval of time chosen for graphing the cases is important to the subsequent interpretation of the epidemic curve. The time interval should be selected on the basis of the incubation period of the disease and the period over which the cases are occurring. Choice of a time interval may also depend on the frequency with which animals are being examined in order to determine when any one animal first shows signs of disease.

For many livestock diseases epidemic curves are often produced using one-day (daily) intervals on the horizontal axis, producing a plot displaying the number of new cases of disease each day. If there are multiple days in between occurrence of new cases then it may be sensible to aggregate the time to weekly or some other interval.

Menangle virus was first identified in 1997, following an investigation of an outbreak of mummified and stillborn foetuses in a commercial piggery at Menangle, New South Wales, Australia.

For the Menangle virus outbreak in an Australian piggery in 1997, temporal patterns were analysed on a weekly basis, because many piggery records are maintained as weekly averages and the epidemic extended over a >20-week period. In addition to the percentage of affected litters per week, average litter sizes and numbers of piglets that were live, mummified or stillborn were plotted, providing a comprehensive picture of the temporal pattern. All indices showed a very rapid rise from week 15 (of the calendar year), when the outbreak started, to week 21, when case numbers peaked. This pattern is strongly suggestive of a propagating epidemic with a rapidly spreading agent and relatively short incubation period (see Figure 2 from Love, et al., Australian Veterinary Journal 79(3):192-198, 2001 for a graphical representation of these patterns).

Spatial patterns

Spatial patterns refer to describing the outbreak in terms of where animals were located (place) when they first showed signs of disease (disease onset). Spatial patterns may assist with finding the source of the outbreak. It is often useful to consider place and time together. This can be done by drawing a plan of the spatial layout of the farm (or population), recording the location and dates when cases occurred. Such a diagram may also give a lead to whether the outbreak is a common source or propagating.

Where disease cases are occurring on a small geographic scale it may be easy and simple to draw simple maps showing disease cases and non-cases. This approach can be done by anyone and does not require special computer software or mapping data to allow computer mapping.

Where disease cases are occurring over a larger area or on a larger scale, it may be more effective to map cases using computer based mapping or Geographic Information Systems (GIS) software. This may require special expertise and additional background mapping files for that location.

Where disease cases are occurring over a more extended period (weeks or more) it is very useful to produce maps at daily, weekly or longer intervals to monitor progress of the epidemic and identify patterns of spread.

For example, the following Figure shows the layout of households in a Thai village, overlaid with the occurrence of cases of foot-and-mouth disease. From this map, it is apparent that this is a propagating epidemic, with the index case identified by a red circle, a small number of secondary cases identified in week 2 and additional cases in week 3. It also appears that infection has spread locally from the index case to a number of nearby households, as well as to some more remote households, where there has also been local spread. The initial spread was perhaps through utilisation of common grazing, allowing close contact between early cases and susceptible animals from elsewhere in the village. This was probably followed by local spread among clusters of households and perhaps from infected animals moving on laneways through the village.

Spatial representation of the spread of foot-and-mouth disease over a 3-week period in a Thai village, adapted from Cleland, et al.1991.

Spread of FMD in Thai village.svg

For the Menangle virus outbreak, the piggery comprised four separate management Units. Unit 1 was about 200m from Unit 2, while units 2 and 3 and 3 and 4 were each separated by about 50m (see Figure 1 in Kirkland, et al,. 2001). Although all units were affected, 44% of litters were affected in Unit 2, compared to 28%, 26% and 37% for Units 1, 3 and 4 respectively. Analysis also showed that Unit 3 was affected first, in week 15. Other units were subsequently affected in weeks 23 (Unit 2), 24 (Unit 4) and 27 (Unit 1). It was also observed that a fruit bat colony (the hypothesised source of infection) was in close proximity to Units 3 and 4. Unit 1 was furthest from the hypothesised source and was the last unit to be affected, while Units 3 and 4 were closest to the hypothesised source.

Animal patterns

The term animal patterns is used to refer to some measure of disease frequency (prevalence or incidence for example) produced for different animal characteristics (species, breed, age, sex, weight class, vaccination status, stocking density) or for different levels of other management type things like location (paddock, pen, village), feed type or other variable.

We use some measure of disease frequency to describe animal patterns of disease. The most common approach in a disease outbreak is to estimate attack rates (AR) but other measures may be used. This approach allows investigation of possible risk factors for the disease.

For example, in the foot-and-mouth disease example shown in the previous figure, 8 of 21 buffalo <1 year old were affected for an attack rate of 0.38 or 38%. In contrast, 34 of 158 buffalo >1 year old were affected (attack rate = 0.215 or 21.5%). This suggests that young animals were almost twice as likely to be affected as older animals.

For example, say there were deaths due to suspected epizootic ulcerative syndrome (EUS) in a pond and it appeared that small fish were at greater risk of having EUS than large fish. We might make the following calculations:

For small fish, AR1 =
No. of small fish with EUS
Total small fish
For large fish, AR2 =
No. of large fish with EUS
Total large fish

There were 1000 small fish in the pond and 300 had EUS and there were 1000 large fish of which 100 developed EUS during the outbreak. The attack rates here are 30% and 10% respectively, suggesting that small fish were 3 times more likely to develop EUS than large fish. This finding could lend support to a hypothesis that nutritionally stressed fish are more susceptible to infection.

Factor-specific attack rates for such factors as species, age, sex, feed, mob, management system etc can be explored. An example is shown below for EUS where size indicating nutritional stress is suspected.

Table showing counts of fish arranged by fish size and EUS disease status along with attack rates for each class of fish size and relative risks comparing the risk of disease to that observed in a reference category (fish size = large).

Count of fish
Epidemiologic measures
Fish size (factor)
EUS cases
Total count
Attack rate (AR)
Relative risk (RR)




In the above table, attack rates are expressed as percentages. The last column is the Relative Risk or Risk Ratio (RR) which is the ratio of the attack rates comparing the AR in small and medium sized fish to the AR measured in large fish.

The higher the relative risk, the more impact the specific factor has in increasing the risk of disease. Small fish were 3 times more likely to have EUS than medium sized fish and 6 times more likely than large fish. Also, medium sized fish were at twice the risk of large fish.

Then we need to think about what differences in fish size mean. It may be that smaller fish are younger or under more nutritional stress and that these factors (age, feed availability, stress) may be the causes driving the apparent association between fish size and EUS disease risk.

Measuring association between disease and risk factors

Once the data are collected on temporal, spatial and animal-level patterns, it needs to be analysed to understand patterns and identify potential risk factors. The most commonly used measures for comparing disease-risk among groups are relative risk (or risk ratio) and the odds ratio. These are discussed briefly below.

2 x 2 tables

A 2x2 table is a simple way to present summary counts of diseased animals where the animals can be classified by disease status (case or non-case) and on some other possible risk factor with two levels (eg vaccination status classified as vaccinated or not-vaccinated). 2x2 tables are very commonly used in epidemiology to assess possible associations between disease occurrence and possible risk factors.

Often disease status (diseased or not-diseased) are arranged in columns and the risk factor (present=exposed, not present=unexposed) is assigned to the rows. A 2x2 table has the following format.

Layout for a 2x2 table showing counts of animals arranged by disease status (columns) and risk factor status (rows).

Disease status
Risk factor
Diseased (case)
Not diseased (non-case)
a + b
c + d
a + c
b + d
a + b + c + d

2x2 tables are commonly used to estimate measures of association such as relative risk or odds ratios.

Relative Risk

The relative risk (RR) is the ratio of a measure of incidence in the exposed group to the measure of incidence in the unexposed group. It can be based on the incidence rate or cumulative incidence and under some situations prevalence. Relative risks are the primary measure of disease association and should always be used when it is possible to estimate a population-at-risk.

Table showing 2x2 layout and calculations required for estimation of a relative risk

Not diseased
a + b
c + d
a + c
b + d
a + b + c + d

Incidence rate exposed = IRexp = a/(a + b)Incidence rate unexposed = IRunexp = c/(c + d)

Relative Risk = RR = {\frac  {{\mathit  {IRexp}}}{{\mathit  {IRunexp}}}} = {\frac  {\left({\frac  {a}{a+b}}\right)}{\left({\frac  {c}{c+d}}\right)}}

There are a number of calculators or tools and apps that can analyse 2x2 tables and estimate Relative Risks or Odds Ratios. Typically these tools will also produce an estimate of the confidence interval and a p-value testing whether the RR is statistically different to 1.

  • If the RR is greater than 1 and is associated with a significant p-value or if the 95% confidence interval for the RR does not include 1, then the risk factor is associated with an increased risk of disease.
  • If the RR is less than 1 and is associated with a significant p-value or if the 95% confidence interval for the RR does not include 1, then the risk factor is associated with a reduced risk of disease. Often factors with a RR less than one are called protective risk factors because they are associated with a reduced risk of disease.

The numeric value of the RR is a measure of the strength of association between the risk factor and disease. If possible the 95% confidence interval for the RR should be examined. If the confidence interval does not overlap one then we can be more confident that the association may be meaningful. A statistical test and p-value will provide similar information - telling you whether the RR is statistically different to one.

Note that the RR alone does not provide evidence of causality. It provides a measure of statistical association. Additional information is required before we can determine that a particular risk factor may be causal for a particular disease.

Factor-specific attack rates and corresponding relative risks for such factors as species, age, sex, feed, mob, management system, etc can be computed and arranged in an attack rate table as shown below. An attack rate table is simply a tabular presentation of attack rates for different risk groups, accompanied by relative and attributable risk values for comparison between groups.

Attack rate table for risk factors for stillbirths in a group of Hereford heifers.

Factor Levels
Attack rate
Relative risk
Attributable risk
Age 14 months at joining
17 months at joining
Sire breed Hereford
Sex of calf Female
Type of birth Assisted

In the above table, attack rates are expressed as percentages. The second last column is the Relative Risk or Risk Ratio (RR) which is the ratio of the attack rates and the last column is the difference in attack rates (the Attributable Risk).

In the example in the Table, the highest relative risk is 2.3, indicating that younger heifers (14 months) were at 2.3 times the risk of having a stillborn calf compared to older (17 months) heifers. However, this has to be interpreted with caution, as the attack rate for older heifers was 15.5%, suggesting that other factors may also have been involved in causing this problem. Examination of the other relative risks list shows them all to be less than 2, suggesting that these factors are not very important. Therefore, from the data provided we can determine that younger heifers are at increased risk of stillbirth, but that there are probably additional factor(s), for which we don't have data, that are contributing to this problem.

Odds Ratio

The odds ratio (OR) is another measure of association that is often used to approximate relative risk. The OR is usually estimated when the population-at-risk is unknown and therefore relative risks cannot be calculated.

OR can be estimated when you have population at risk data but they do not require population at risk data. Odds ratios only require data collected on cases and non-cases and often in these situations it is not possible to estimate a population at risk and therefore RR cannot be used.

Imagine a situation where you are investigating a rare disease. You visit farms every two weeks and identify any animals that develop the disease of interest. Each time you identify an animal with the disease, you select another animal from the same farm that does not have the disease (non-case). You then collect risk factor information on cases and non-cases.

This is a case-control study. Over time you build up a dataset with a number of cases and a number of non-cases. But, you do not ever have an estimate of population at risk. Using this dataset, you cannot estimate incidence, prevalence or relative risk. You can estimate OR as a measure of strength of association between disease and risk factors.

Odds ratios are calculated using the same 2x2 table structure as for relative risks but the formula is different. As an epidemiologist you will need to know when RR is able to be estimated and when it cannot be estimated.

Table showing 2x2 layout and calculations required to estimate an odds ratio

Not diseased
a + b
c + d
a + c
b + d
a + b + c + d

Odds exposed = {\frac  {a}{b}}Odds unexposed = {\frac  {c}{d}}

Odds Ratio = OR = {\frac  {{\frac  {a}{b}}}{{\frac  {c}{d}}}}={\frac  {{\mathit  {ad}}}{{\mathit  {bc}}}}

The Odds Ratio is usually interpreted like a relative risk.

When a disease is rare, the numeric value of the odds ratio can be shown to be closer to the numeric value of the RR. Compare the relative risk and the odds ratio:

  • a is small compared to b, and c is small compared to d
  • therefore, (a+b) approximates b and (c+d) approximates d
  • therefore, the RR approximates ad/bc
  • so the OR estimates the RR as long as the disease is rare (say ~<10% but the approximation becomes better as the disease becomes rarer)
  • When the disease gets more common the OR provides a poorer numeric approximation of the relative risk.

We can use confidence intervals and statistical tests to interpret the strength of association for OR in the same way that we can for RR.

Attributable risk (AR)

Attributable risk (also called risk difference) is a measure of association that is based only on the exposed population. It is the absolute difference between the two incidence rates from a 2 x 2 table. The AR tells the rate of disease in the exposed population that is attributable to being exposed. AR provides an estimate of the rate of disease that could be prevented if the exposure were removed completely from the population.

Table showing 2x2 layout and calculations required for estimation of attributable risk

Diseased Not diseased
Exposed a b
Unexposed c d

IRexp = \left({\frac  {a}{a+b}}\right)IRunexp = \left({\frac  {c}{c+d}}\right)

Attributable Risk = AR = IRexp - IRunexp = \left({\frac  {a}{a+b}}\right)-\left({\frac  {c}{c+d}}\right)

The AR has the same units as the IR and can theoretically vary from -1 to +1; the null value is zero. Remember that the RR has no units and has a null value of 1.0.

We can use confidence intervals and statistical tests to interpret the strength of association for AR in the same way that we can for RR and OR.

Attributable fraction (AF)

The attributable fraction (AF) is also only relevant for the exposed population and expresses the AR as a fraction of the incidence rate among the exposed - this measure indicates the proportion of disease in the exposed that could have been prevented had exposure not occurred.

{\mathit  {AF}}={\frac  {\left({{\mathit  {IR}}}_{{\exp }}-{{\mathit  {IR}}}_{{{\mathit  {unexp}}}}\right)}{{{\mathit  {IR}}}_{{\exp }}}}={\frac  {\left({\mathit  {RR}}-1\right)}{{\mathit  {RR}}}}

{\mathit  {AF}}{\mathit  {for}}{\mathit  {case}}-{\mathit  {control}}={\frac  {\left(-1\right)}{}}

Biological meaningfulness and 95% Confidence intervals

Relative risk and odds ratio estimates provide measures of association between disease occurrence and a risk factor. They provide an estimate of how much exposure to the risk factor increases (or decreases) the rate or amount of disease in a population.

  • RR or OR less than 1 (exposure is protective)
  • RR or OR equal to 1 (no increase in risk or protectiveness from risk factor)
  • RR or OR greater than 1 (exposure = increased risk)

Most statistical packages or online calculators that can estimate RR or OR will also provide a 95% confidence interval for the estimate and use statistical significance tests, such as the Chi-square test, to determine if the RR or OR is significantly different to 1.

An estimate that is associated with a significant p-value (p<0.05) is considered potentially important but we need to consider both the p-value and the 95% confidence interval for the estimate. Statistical tests only tell us the probability that the observed result would have occurred due to chance alone - it tells us nothing about the biological importance of the risk factor. A risk factor may have a statistically significant effect in a particular study, but not be biologically important, and vice versa

The numeric estimate of the RR (or OR) and the 95% confidence interval often provide more useful information on the biological importance of the association.

When the total sample size is small, all estimates are likely to be not terribly useful.

Wildlife vaccines for rabies were dropped from planes and contained tetracycline markers so foxes eating the baits would be able to be identified as vaccinated based on the staining of their teeth. Over time foxes that were killed or found dead were assigned to a 2x2 table based on whether they were rabies positive or negative, and vaccinated or unvaccinated.

Rabies +
Rabies -

OR= 2.29 95% CI from 0.39 to 13.33

The OR suggests that the association is useful but the confidence interval ranges widely across one. This is probably because of the relatively small sample size.

The same wildlife study is continued for longer until additional samples are obtained and the analysis is repeated.

Rabies +
Rabies -

OR= 2.30 95% CI from 0.97 to 5.45

The OR is similar but the confidence interval is now almost all positive. It does extend just below 1 and the p-value is 0.06 (just not significant). However, the fact that the bulk of the OR is positive is reasonable evidence of a good association between the risk factor (vaccination status) and odds of rabies.

Advanced statistical analyses

Once potential risk factors have been identified and their importance assessed in unadjusted screening such as 2x2 tables it may be useful to undertake further statistical analyses.

Attack rates provide measures of disease occurrence - how much disease is occurring? Producing attack rate estimates for different levels of some other factors provide additional information on how much of the disease may be due to exposure to the particular risk factor.

Relative risk and odds ratio estimates then provide measures of the strength of association between disease and possible risk factors. We can use statistical testing to determine if the observed association is likely to be due to chance or not - we use this information to try and determine if the apparent relationship is important or not.

Potential risk factors can have a high relative risk but be statistically not-significant or vice versa, depending largely on sample size. Therefore, it is important to always consider high relative risk values (say >3) as being worth further investigation, even if they are not statistically significant. This is particularly true when the estimate is based on a small total count of animals (small sample size).

When there are multiple possible factors that may influence disease risk the first approach is to perform simple estimates of AR for each factor one at a time. These are called unadjusted or crude or screening associations. More advanced analyses may involve considering multiple factors at once (multivariable analyses) in order to adjust for interaction or confounding between different factors. These methods are beyond the scope of these notes and require advanced statistical expertise and special software.

Developing hypotheses and control measures

When you are investigating a disease outbreak and particularly in an emergency situation it is important to begin to develop hypotheses about the nature of the disease as soon as possible and to use this information to identify possible causes and interim recommendations that may help to control the disease and prevent further cases.

Generating hypotheses about the disease means using the description of the disease patterns and attack rates and other initial analyses to inform either a differential diagnosis list or if this is not possible then a guess at the sort of disease process that may be occurring (infectious or non-infectious, point-source or propagating outbreak).

Examples of hypotheses relevant for disease investigations include:

  • the nature of the causal agent (eg toxin, infectious, viral, bacterial, etc)
  • the source of the agent (eg environmental, species jump, introduced animals, endemic infection, etc)
  • the method(s) of transmission (eg direct contact, food-borne, vector-borne, etc)
  • why the incident has occurred (eg change in herd immunity levels, introduction of new disease, change in management practices, etc)
  • risk factors for disease (eg exposure to specific feed components, or potential sources of infection)

The disease hypotheses are then used to inform interim recommendations for control measures. Control measures refer to any intervention aiming to reduce occurrence of disease or eradicate disease. Treatment of sick animals is one form of control measure and interventions to prevent spread or eradicate disease from an area are all types of control measures.

Disease hypotheses should be based on the facts gathered during the initial investigation. It may be possible to draw up a causal diagram for the disease showing how the various factors interact to cause the disease. This process helps to understand the disease process and can often lead to an improved understanding of the relationships between possible risk factors. Consideration of these relationships will often help identify points where intervention can be made to control and/or prevent the disease occurring.

In many situations, initial hypotheses can then be tested using further investigations conducted while the outbreak is still under investigation and while interim control measures are being implemented.

For example, if you suspected one or more specific disease agents you may be able to collect samples and send them off for testing to rule in or rule out those particular diseases. If the results confirm your initial hypotheses then your management of the situation may be clarified. If the results rule out the initial hypotheses then further epidemiologic investigation will be required.

Some measures may be implemented based on general precautions without any knowledge of what the disease might be. For example, isolation and quarantine of the affected properties and affected animals within a property, symptomatic treatment of affected animals and so on. In many situations the results of the initial investigation are used to inform interim control measures. As further information is collected these initial control measures can then be modified.

Actual measures implemented will depend on the individual circumstances, but could include one or more of the following:

  • Specific treatments;
  • Vaccination;
  • Changes in nutrition, feed ingredients and/or management factors;
  • Isolation or quarantine;
  • Surveillance of the affected population and other at-risk populations for evidence of further spread and new cases;
  • Changes in environment and/or housing;
  • Safe destruction and disposal of contaminated waste or other infectious materials;
  • Disinfection and decontamination; and
  • Salvage sale or slaughter of animals.

Once hypotheses have been formulated, it is important to review and evaluate them. In particular:

  • Do they explain the observations?
  • Are they reasonable?
  • Are there any facts that contradict the hypothesis and how can these be explained?
  • Are there any unexplained aspects of the situation requiring further investigation and evaluation?
  • What additional data do we need to test the hypotheses, or is there sufficient data already available?

In some cases, it will not be possible to stop an outbreak once it starts, but the detailed investigation of one or more outbreaks should provide valuable insight into possibly important "component" causes and support the development of strategies to prevent future outbreaks.

For Menangle virus, based on the observations during the outbreak, it was hypothesised that:

  • The outbreak was a propagating epidemic of a previously unidentified virus causing infertility, stillbirths, mummified foetuses and congenital deformities
  • The probable source of infection was from a fruit bat colony, either on fly-past or entry to sheds or laneways
  • Spread within the piggery occurred via close contact and fomites during acute infection and at farrowing

An obvious conclusion from this was that the easiest way to prevent future outbreaks was to prevent any contact between pigs and fruit bats by enclosing and screening all sheds and laneways.

Sera and faeces were collected from fruit bats from the nearby colony, to test the hypothesis that the bats were the source of this virus. Forty of 80 (50%) serum samples were positive but virus was unable to be isolated from faeces from 55 bats.

Based on the findings from the various investigations it was not deemed possible to prevent continuing spread of the virus at the time of the outbreak. Instead, it was decided to undertake a staged eradication program once the main epidemic had burned out, including:

  • Progressive eradication from the four production Units
  • Segregation, depopulation and staged repopulation of Units
  • Sheds and walkways "flying-fox-proofed" to prevent re-introduction
  • Serological testing to monitor progress

Successful eradication was achieved and subsequently demonstrated by on-going monitoring of the population.

Role of tracing in disease investigation

Tracing of livestock movements is an important tool particularly for the detection of infected herds or flocks and particularly where there is interest in eradication of a disease. Tracing usually involves the identification of potentially infected farms through the tracing of movements of infected or exposed animals.

If there is no clear policy to control or eradicate the disease then there may be little justification to do tracing.

Further testing is usually undertaken on any other farms identified by tracing and considered to be at risk of having infected animals to establish their true infection status. If a farm's infection status cannot be determined immediately, quarantine measures may be imposed until the situation is resolved.

Further field epidemiology studies

The initial disease investigation aims to describe the disease cases, identify possible causes and implement control measures. In some cases, it may be possible to confidently diagnose a specific disease and implement effective treatment and control measures. In this situation there may be little need to conduct further investigation.

In many situations there will be a need to conduct further field investigations to provide additional information on one or more of the following:

  • Gather more information on possible causes of the disease including identification of the infectious agent (if there is one) and identify other causes.
  • Use tracing to identify the origin of the disease and risk of spread to other farms or locations.
  • Monitor effects of control measures to measure efficacy of control measures and apply new measures if necessary, ensure that no new cases are occurring and that affected animals are recovering.
  • Use field or experimental studies to test hypotheses arising from the initial investigations about possible causes and control measures.

Menangle virus was first identified in 1997, following an investigation of a serious outbreak of mummified and stillborn foetuses in a commercial piggery at Menangle, New South Wales, Australia. A high proportion of litters born to sows that were pregnant at the time of exposure to the virus were affected, although clinical disease was not noticed at the time of infection. After an extensive investigation the infection was traced to a nearby colony of fruit bats (flying foxes), with a high proportion of bats sampled found to be seropositive for Menangle virus antibodies.

During the Menangle virus investigations, a wide range of additional investigations were undertaken, including:

  • Detailed pathological, serological, microbiological and virological examination of affected and unaffected pigs to determine the likely cause and to rule out known infections and other diseases.
  • Cross-sectional serological survey of all units/sheds to determine the extent and progress of infection.
  • Surveys of pigs and piggeries in contact to determine whether infection had spread beyond the Menangle piggery.
  • Sampling of unexposed piggeries to demonstrate freedom of the rest of the industry.
  • Testing of archived sera (from this and other piggeries nation-wide) to demonstrate that it was a new infection not previously present in the Australian pig population.
  • Interview and testing of piggery workers and others potentially exposed to evaluate public health risks.
  • Serology on other species as potential sources.
  • Serology and virus isolation on fruit bats to support the hypothesis that they were the likely source.

Types of epidemiologic studies

Epidemiologic studies can be broadly grouped into observational study, intervention study and theoretical epidemiology.

The characteristics of different study types are described briefly below and the advantages and disadvantages of each type are summarised. For more information on epidemiological study design, readers should consult standard epidemiology texts ([#17 Martin et al., 1987];[#23 Thrusfield, 1995];[#20 Rothman et al., 2008]; [#6 Dohoo et al., 2010]).

In disease investigations, almost all field studies will be observational studies.

Figure showing classification of epidemiologic study types

Classification of epidemiologic study types.svg

Observational studies

In observational studies nature is allowed to take its course, and the study aims to collect data by observing what happens without intervention from the investigator. There are basically four types of observational study: descriptive study, cross-sectional study, case-control study and cohort study. Cross-sectional, case-control and cohort studies may also be termed analytic studies because they usually involve some form of statistical testing of various hypotheses.

Descriptive studies

A descriptive study can collect data to describe distribution and occurrence of a disease in a population but does not involve statistical hypothesis testing of possible risk factors. The first part of the investigation (describing disease in terms of time, place and animal) is largely descriptive but it can lead to hypotheses that may be tested and the data may be used in subsequent statistical testing of some sort.

Cross-sectional studies

Cross-sectional studies involve selection of animals at a point in time (or over a defined period) and then the prevalence of the disease in question is measured and data gathered on other factors to allow comparison between presence or absence of disease and presence or absence of various possible risk factors. Cross-sectional studies can be done quickly and cost-effectively but are less effective for testing hypotheses about causation of disease.

Cross-sectional studies Cross sectional studies.svg

For example, you might undertake a randomised cross-sectional study of villages in a country for exposure to foot-and-mouth disease (FMD) virus. This would allow you to estimate the seroprevalence and to identify possible risk factors for exposure to support either follow-up studies and/or planning for future management of FMD.

Case-control studies

A "case" group is selected from animals with the disease of interest and a "control" group is selected from animals without the disease. The presence or absence of possible risk factors are then measured for the two groups and compared. Case-control studies are well suited to rare diseases and many suspected risk factors can be compared at the same time. They are relatively quick and inexpensive to perform but are susceptible to many biases and do not allow estimation of disease frequency (prevalence or incidence).

Case-control studies are very commonly done in disease investigation and particularly in the early stages because they are able to be implemented as soon as a case definition is completed and animals assigned to either cases and non-cases. This means that a case-control study can be begun while the very early stages of disease investigation are still going on.

Case-control studies Case control studies.svg

For example, you might undertake a case-control study for foot-and-mouth disease occurrence in village livestock. Case villages would be selected from known affected villages while controls would be selected from unaffected villages in the same region. This would allow you to identify village-level risk factors for infection, to support planning for prevention and management of future outbreaks.

Cohort studies

The word cohort just means any group of animals that is followed over a period of time.

In a cohort study animals that are free of the disease of interest are selected based on presence or absence of one or more defined risk factors (presence of the risk factor = exposed group and absence of the risk factor = unexposed group). The selected animals are then monitored forward in time to measure the occurrence of the disease of interest in each group. In some cases where detailed retrospective records are available it may be possible to use retrospective data to perform a cohort study but the approach is still the same.

Cohort studies can provide incidence rates for the disease in the exposed and unexposed groups and they provide stronger evidence for causation of disease than either cross-sectional or case-control studies. They are also more expensive and take longer to plan and complete.

Cohort studies Cohort studies.svg

The best known examples of cohort studies are numerous studies investigating health outcomes associated with cigarette smoking. Comparison of health outcomes between smokers and non-smokers has allowed researchers to quantify the increase in risk of lung cancer, cardio-vascular disease and other health problems associated with increased levels of smoking.

Intervention studies

A field intervention study is a type of clinical trial or experimental trial or study.

The key distinction from observational studies is that animals are randomly assigned to two or more treatment or intervention groups and then the effects of these different interventions is compared. Intervention studies may be used to test efficacy of various treatments for disease control (vaccination compared to no-vaccination, various different management or treatment strategies).

Intervention studies

For example, mineral deficiencies can often result in poor growth and even death of young sheep or cattle. Often you may suspect that a particular mineral is deficient but be unable to demonstrate this conclusively. One way of achieving this is to run a field trial, comparing growth rates in treated and untreated groups that are similar in all other ways.

Theoretical studies

Theoretical epidemiology studies are based on mathematical modelling using a computer and are designed to use simulation to answer "what-if" type questions. There are a wide variety of modelling methods used, but the primary aim is to reproduce a realistic simulation of disease behaviour in a population. The major benefits of models are that:

  • The process of developing and interpreting the model often leads to valuable insights into disease epidemiology and behaviour that might not otherwise be apparent; and
  • Models provide a structured and controlled environment in which hypothesised interventions can be tested and evaluated at significantly lower cost than undertaking field experiments or observations to achieve the same result (or for interventions that may not be practical to implement experimentally).

Models are particularly useful in examining the behaviour and impact of infectious diseases as well as the possible effects of a range of interventions. The results from such studies need to be confirmed with follow-up observational or intervention studies wherever possible.

Theoretical modelling generally involves advanced mathematical and statistical skills and custom software.

Theoretical studies Theoretical studies.svg

For example, simulation models of the spread of FMD have been used to help understand the behaviour of the 2001 outbreak in the UK and to predict the potential impact of alternative control strategies ([#19 Morris et al., 2001])

Characteristics, strengths and weaknesses of main study types (adapted from Thrusfield, 1995)

Study type Characteristics Advantages Disadvantages
Descriptive * Observational * Describe patterns of disease in the population * Relatively quick and easy * Can generate hypotheses on possible risk factors for further investigation * Doesn't require random sampling or high degree of rigour * Doesn't support hypothesis testing or inference for possible risk factors * Can't estimate prevalence or incidence or exposure proportions * Subject to inherent biases and errors because of the nature of the data
Cross-sectional * Observational * Observation at point in time * Outcome/exposure not considered in selection * Disease prevalence in exposed and unexposed populations can be estimated * Exposure proportions can be estimated * Relatively quick and cost-effective * Can study multiple factors at once * Unsuited to investigating rare diseases * Less useful for acute diseases * May be difficult to control potential confounders * Incidence cannot be estimated * May be difficult to determine causality * May be problems with reliability of data/recall for historical data
Case-control * Observational * Retrospective longitudinal * Selection based on outcome status * Good for rare diseases * Relatively rapid and cost-effective * Relatively small sample sizes * Often use existing data * Can study multiple factors at once * May be difficult to establish causality * Can't estimate prevalence or incidence or exposure proportions * Rely on access to historical data or recall * Difficult to validate data * May be affected by variables for which data is not collected * Selection of controls often difficult
Cohort * Observational * Prospective longitudinal * Selection based on exposure status * Can calculate incidence in exposed and unexposed individuals * Can provide strong evidence for causality * Exposed/unexposed proportions cannot be estimated * Large sample sizes, particularly for rare diseases * Can only investigate small number of potential risk factors at any one time * Long duration of follow-up * Relatively expensive and time-consuming * Loss of individuals to follow-up * May be difficult to control potential confounders
Field/clinical trials * Intervention * Longitudinal * Randomised selection * Relatively quick * Good for helping establish causation * Usually strong internal validity * Relatively small sample size and usually short duration * Can't estimate incidence/prevalence * May be problems with external validity, particularly to diverse target population * Can be expensive depending on the intervention and situation * Requires significant cooperation and rigorous management

Choice of field study

Field epidemiology for disease investigation will almost always involve observational studies and only very occasionally intervention studies.

Case-control and cross-sectional studies are very well suited to the early stages of a disease investigation because the investigation has to proceed through a case definition and then assignment of animals to either cases or non-cases followed by collection of data to describe disease patterns in time, place and animal.

Case-control studies do not have a population at risk. They are based on selection of cases and a separate selection of non-cases (controls). As a result case-control studies cannot be used to produce prevalence or incidence measures. Because of this, they cannot be used to produce relative risk measures. Data from case-control studies can produce odds ratios and these are the primary measure of strength of association for case-control studies. Case-control studies are particularly useful for studying rare diseases because cases can be detected and then enrolled in the study along with one or more controls.

Cohort studies are generally likely to be used in later stages of a disease investigation or in follow-up studies to test hypotheses about causal factors. Cohort studies start with a population that does not have the disease (population at risk) and detect new cases of disease. They can measure incidence rate and are used to produce relative risks as a measure of association between disease and risk factors. Cohort studies may also be more useful when there is interest in assessing specific exposures and particularly rare exposures since these can be defined and used to select the groups at the beginning of the study.

Defining groups based on outcome and exposure

The general purpose of field observational studies is usually to collect data on a number of animals with a particular focus on disease status (presence of disease = case, absence of disease = non-case), and on the presence or absence of one or more risk factors. We can then compare the diseased group to the non-diseased group to look for differences in risk factors.

The term outcome refers to whether or not an animal is recorded as a case (disease present) or non-case (disease absent). Information is provided in the Basic Field Epidemiology Manual about development of the case definition and the use of the definition in conjunction with clinical examination and possibly laboratory tests or other information of animals to assign animals to confirmed case, suspect case and non-case categories.

When designing and analysing field studies many people use the term exposure. Exposure refers to the presence or absence of some risk factor or the level or category of a risk factor for each animal. Examples of possible risk factors that may be measured include species, breed, sex, body weight, age, vaccination status, recent treatments or other management procedures, place of origin, location of animals, recent climate (rainfall, temperature, humidity), feed, recent movements, pregnancy status etc.

Some risk factors do not change over time (breed, species and often sex) and may be described as fixed (unchanging). Other risk factors (age, body weight, feed, location, recent treatments) may change over time and may be measured at a defined time or used to develop categories (age less than 1 year or greater than 1 year) or averages.

Some risk factors are measured as categories (sex=male entire, male castrate, female), or scores (body condition score: 1=poor, 2=backward (thin), 3=moderate (no significant fat), 4=forward, 5=fat), or as a measurement on a continuous scale (body weight measured on a scale in kg).

Selection of animals

In a case-control study, the cases must have the disease being investigated, meaning that they must meet the case definition. The case definition must also be independent of the risk factors that are being studied. It is generally best to use the confirmed case definition and to try and restrict the selected cases to those animals that are likely to be recent cases as opposed to animals that may have been infected some time ago (chronic cases). If some of the cases die as a result of the disease then studying chronic cases may be more informative about factors that influence survival than factors that determine risk of occurrence of disease.

A comparable group of non-cases (controls) then has to be selected for comparison. Selection of controls for a case-control study is a complex subject and these notes provide brief coverage of simple points.

Controls should not have the disease (meet the definition for non-case) and should be identified independent of any possible risk factor of interest. The controls should generally be representative of the population from which the cases arose. This means if the cases came from animals admitted to one or more veterinary clinics then the controls should be selected from those same clinics. The same applies to farms.

Controls should be eligible to get the disease and to be selected as cases if they had developed the disease. This means that if the disease is causing abortion in pregnant cattle, then male cattle would not be selected as controls. Any animal that may have had the disease earlier and recovered and considered immune would not be considered for selection as a control because they may not be expected to get the disease again.

Where the investigation involves a relatively small area (one farm or one village) it may be possible to assess all (or nearly all) animals at that location and assign them to either cases or non-cases. These two groups may then form the basis of a study. If you are able to collect data on all animals at the location and if you can confidently determine onset of disease then this will form a cohort study and not a case-control study and you will be able to estimate incidence or prevalence measures and use relative risk as the statistical measure of association between disease and risk factors.

Where the investigation involves multiple farms or a larger area or number of animals and it is not possible to assess and include every animal in a study, it will be necessary to select a subset or sample of cases and controls. If you have a list of non-case animals, then random sampling can be used to select a group of controls. More commonly you will not have a list of all animals to choose from. In this case it may be most practical to select one or more controls for each case from those animals that are close to the case (same paddock or pen) and that meet the definition for a non-case. In an open population (where animals are entering and leaving the local population), controls should be selected from those animals with a similar exposure time to the cases.

Most case-control studies select one control for every case. There may be some value in selecting 2-3 controls for every case because of the increased sample size but there is likely to be little value in selecting more than 3-4 controls per case.

Sometimes controls are selected in a process that involves matching on a defined attribute with each case. For example, a 2-year old cow with diarrhoea (case) may be matched with a 2-year old cow without diarrhoea (matched control). Matching is usually used to control for some form of confounding between the factor being used for matching (age) and the association between some other possible causal factors and the disease of interest. When a factor is matched it cannot be analysed for an association with disease. Matching is a complex topic and should not be attempted without advice from a veterinary epidemiologist.

There are a range of issues relating to selection of animals for a cohort study as well. Animals eligible for inclusion in a cohort study should be free of the disease of interest at the start of the study and should be at risk of developing the disease. If animals may have previously had the disease and recovered and are now immune then these animals are no longer at risk of getting the disease and should be excluded from selection. Other diseases may recur once some time has elapsed for recovery. In these situations it may be possible to state that animals become eligible for inclusion in the study providing they have not had the disease within a defined period prior to the start of the study.

Cohort studies may be closed or open. A closed study is simpler and enrols a defined selection of animals at the start and then only those animals are followed over time (no new animals are added) and every effort is made to follow all enrolled animals right through to the end of the follow-up period. An open study may allow animals to either enter or exit the study population at any time so that any one animal may be followed for a different time period.

Bias, confounding and interaction

Observational studies are subject to bias. Bias occurs in an epidemiological study when the observations do not reflect the true situation because of some systematic error. Knowledge about type of bias and strategies to minimise risk of bias is essential in designing and implementing epidemiologic studies.

Bias is any effect at any stage of an investigation tending to produce results that depart systematically from the true values i.e. a systematic error (lack of validity) rather than a random error (lack of precision).

Although there are many different types of bias, they can be broadly classified into three general categories: selection, measurement and confounding. The differences between these categories are not always clear-cut and the strategies for preventing bias are not always exclusive to a single type. They may be viewed as a connected group of issues capable of interfering with inference.

Selection bias

Selection bias is a systematic error in the way that the samples of study units were drawn from their underlying populations, or in the way that study units were assigned to interventions

The potential for selection bias is high in many observational studies. For example, if in a cross sectional sample of prawns from a pond, the easy-to-catch prawns in the shallow water near the edge of the pond were caught the sample may not be representative of all prawns in the pond and a selection bias would result.

If, in an intervention study, an investigator can't describe a formal decision rule that he or she used to select subjects or assign treatments, then there is a risk of selection bias. Common types of selection bias in surveys include differences between volunteers vs randomly selected subjects, and responders vs non-responders. Another important source of selection bias arises from differences in access to extension activities and technical advice. For example, prawn farms that have regular input from trained specialists are very unlikely to be representative of all prawn farms.

There are many strategies for protecting against selection bias. These include having clear criteria for subjects to be eligible for inclusion in the study (see previous selection). Randomised selection uses chance to provide protection against selection bias.

Measurement bias

Measurement bias is a systematic error in the way that data were gathered or measured.

Measurement bias is often called misclassification bias. Animals may be misclassified with respect to disease (a case recorded as a control or vice versa) or with respect to a risk factor.

Misclassification may be differential (when one group is more or less likely to be misclassified than another) or nondifferential.

For example, if you are looking for evidence of previous exposure to a chemical that is suspected as a causal factor for a disease in fish, farms with and without the disease may be investigated regarding chemical use. In such a case, all farms should be questioned with equal vigour, and with equal adherence to non-leading questions to avoid triggering "recall bias" as much as possible.

When investigating causal associations equal effort should be expended in searching for old records for the diseased and the non-diseased groups. If you're following farms or ponds which are exposed and not exposed to a suspect causal factor you must guard against checking the exposed groups twice as frequently or using more sensitive methods of disease detection in the exposed group. This is to avoid "diagnostic work-up bias".

There are lots of research design features that help to protect against measurement bias. Some of the more obvious include:

  • Blind the measurer/data collector.
  • Get better measuring equipment or tests.
  • Standardise the protocol for data collection.
  • Use prospective rather than retrospective data.
  • Use objective rather than subjective measurement criteria.


Confounding occurs when two risk factors are interrelated and it is incorrectly concluded that one of the factors is causally related to the disease in question. For example, it might be observed that shrimp in ponds with cloudy water do not grow as well as those in clearer water. We might conclude from this that light penetration of the water is important for normal growth. However, it may be that the cloudiness is due to the presence of particular algal species in the water which inhibit growth of the shrimp through toxin production. In this theoretical example, confounding has meant that we have incorrectly concluded that light penetration is associated with poor shrimp growth when the true cause was the presence of toxic algae. The relationships are represented below.

Confounding is a systematic error that results from unaccounted-for differential distributions of particular covariates. Confounding may be viewed as a form of bias.


Example of relationships resulting in confounding leading to incorrect conclusions on the cause of poor shrimp growth

Confounding is one of the critical problems to watch for when undertaking an epidemiological study. It would probably be better to name the problem "confusing" as it occurs when the effects of two or more factors are mixed and it is difficult to determine which factors are truly "causal" in an epidemiological sense.

To be a confounder, an exposure factor must:

  • be a risk factor for the disease in question;
  • be associated with the exposure factor under study in the source population; and
  • not be affected by the exposure factor or the disease. In particular, it cannot be an intermediate step in the causal path between the exposure and the disease.

Confounding is situation-specific, and you have to know something about the biology and logic of the situation to guess at things that should be explored as confounders. In general, in most studies you should at least think about the following kinds of variables: species, age, breed, season, sex and physiological status (eg spawning, nursing and growing), level of production.

One of the best protections against confounding bias is randomisation. Randomisation assures that, on average, most confounders will be distributed roughly evenly between treatment groups or sub-samples. The reason randomisation is useful is that randomisation is the only available method for controlling confounding due to unknown or unmeasured variables. All other methods to control confounding assume that you know enough to have a measurement of the potential confounder. These other methods for controlling confounding include:

  • Restriction of entry into the study
  • Stratification (and its extreme: matching) in the design
  • Standardisation of rates
  • Stratification in the analysis
  • Adjustment using multivariable statistical methods in the analysis

Interaction (effect modification)

Where two or more risk factors play a role in the causation of a disease, the possibility exists for interaction (also called effect modification) to occur between two or more of the factors. Interaction is different from confounding.

Interaction occurs when the incidence of disease in the presence of two or more risk factors differs from the incidence expected to result from their individual effects.

When interaction occurs, the effect can be greater than what we expect (positive interaction or synergism) or less than what we would expect (negative interaction or antagonism). The problem when evaluating effect modification is to ascertain what we would expect to result from the individual impacts of the different risk factors.

For example, say we find that the incidence of EUS is:

  • 5% in fish in ponds with acidic water and a smooth lining;
  • 2% in fish in ponds with non-acidic water and a rough lining; and
  • 15% in fish in ponds with acidic water and a rough lining.

The 15% incidence is a lot higher than we would expect if the two factors of acidity and rough pond lining operated independently to increase the risk of EUS. We would therefore suspect synergy between these two risk factors and would need to investigate further.

In complex epidemiological studies, information is often collected on a wide variety of factors to identify the important risk factors for the disease of interest.

Assessment of interaction is commonly done during analysis and requires more advanced statistical skills.

Sample size

The number of measurements or animals included in a study (sample size) has the potential to influence a variety of measures including things like variance, confidence intervals and statistical significance. The smaller the sample size, the more likely it will be to generate results from analyses that may not be of much use in identifying causes for a disease. In some cases it may be possible that a study specifically assess one or more risk factors and fails to show any association with disease and yet if the same study had been performed with a larger sample size it might have identified the risk factors as causes of disease.

In practice the number of samples that may be able to be collected will be limited by available resources (labour, time, budget, sample storage and testing capacity).

It is possible to perform calculations to inform likely sample size estimates before a study is performed. The EpiTools website ( has sample size calculators for a variety of study types including cohort and case-control studies.

The following information provides some simple rules of thumb for sample sizes for a field study aiming to identify a possible causal agent for a disease outbreak.

Where there is no single clear diagnosis for the disease or identification of a disease agent, there may be interest in conducting additional tests to confirm the specific disease and identify the infectious agent (if it is considered likely to be an infectious disease). This will typically involve laboratory testing of samples collected from animals that meet the case definition and from a comparison group of non-cases. The laboratory testing will generally look for detection of a candidate agent (virus of bacteria) and compare the prevalence of positive results in the two groups. If the results show a very low detection of the agent in the free or non-case group and a very high detection of agent in the exposed or case group then this would support a hypothesis of the infectious agent being a cause of the disease.

At least 10 animals at each stage of disease should be examined, but, if resources permit, this number should be extended to as high as 30. Statistical methods can then be used to assist in identifying which pathogen is the most likely.

The below table shows the number of animals that need to be examined to provide data for statistical analysis of association between disease and possible causal factors. Such analysis can assist in identifying which cause, from list of possible causes, is the most likely.

The dark shaded boxes show that examining 25-30 animals per case and non-case group are needed to ensure a high probability of identifying a difference between groups.

The light shaded boxes show the differences in prevalence required for a sample size of approximately 10 animals per group. In particular, where the difference between cases and non-cases is large (top left corner), only small numbers of animals are required to provide a high level of confidence that the observed association is not due to chance. In contrast, very large numbers are required if the difference between groups is likely to be small (diagonal from bottom left to top right).

Number of animals per group to examine to determine if a particular finding is more common in cases than non-cases (95% confidence, 80% power, equal sizes for case and non-case groups and assuming a two-tailed test.

Sample size calculation matrix.jpg

Say we had taken specimens for detailed laboratory examination from 30 cases of a particular syndrome and 30 non-cases in a manner as previously described with the following microbiological results:

Number (%) of infected Number to detect
Cases Non-cases observed difference
Organism 1 19 (63) 14 (47) 408
Organism 2 26 (87) 14 (47) 25
Organism 3 27 (90) 25 (83) 219

From the above results, and with reference the above table, Organism 2 is the only one which is statistically associated with an animal being a case, despite Organism 3 being isolated more frequently from cases. The reason for this conclusion is that the sample size of 30 is insufficient to detect a statistical difference in the isolation rates from cases and non-cases for Organisms 1 and 3, but is sufficient for Organism 2. This does not 'prove' that Organism 2 is the primary pathogen (as it could be an opportunistic, secondary invader), but by examining a reasonable number (in this case, 30) of cases and non-cases we are much better able understand the relative importance of the three organisms.

Planning a field epidemiological study

It is important to use a structured and systematic approach.

Steps in design and analysing field epidemiologic studies. From Gregg 2002.

Planning a field epi study.svg

Identify the scope and responsibilities for any investigation

The first step in any epidemiological analysis is to clearly define the problem and the scope, context and expected outcomes of the investigation. This might include determining if there is a disease problem and, if there is, to:

  • determine the extent and impact of the problem
  • identify possible and probable cause(s) and source(s) of the problem
  • identify likely risk factors for the disease
  • make recommendations for control and/or treatment and for future prevention

Where the analysis is undertaken at the request of a third party (for example government policy makers), it is important that any request is documented and that the terms of reference are clear and unambiguous.

During planning, it is also important to have clearly defined responsibilities (who is doing what and by when), deliverables (reports, software, information management system), and a detailed budget.

SMART objectives

Project objectives define the specific questions that the project will be expected to answer.

If the objective of an investigation is to estimate the prevalence of white spot disease virus in shrimp breeding stock, the study design should be directed at this objective, not at identifying risk-factors or looking for other viruses.

Before proceeding with the study you should (in consultation with others involved) define the objectives and expected outcomes of the study. SMART objectives are:

  • Specific: meaning they are clear and well-defined. On completion of the investigation, it should be a straightforward process to determine whether or not the objectives have been achieved.
  • Measurable: meaning each objective is associated with an outcome that can be measured to allow you to monitor and quantify progress toward achieving each objective and so you and others can determine when the objective has been achieved.
  • Achievable: meaning that the objectives are practical and feasible and likely to be achieved with the skills and resources defined in the project.
  • Relevant: Each objective should be relevant to the overall project goal or aim. Objectives that are not relevant risk wasting effort on producing a result that is subsequently ignored.
  • Time-bound: meaning that each objective should include a timeline and milestones to be achieved within a given timeframe. Failure to specify a timeframe risks a project being continually delayed while projects that are perceived to be more urgent (those with specific deadlines), are progressed.

As the objectives are being developed it is important to consider and plan for how data and information may be collected and in what form. This information will in turn drive the types of analyses that will need to be done.

Searching the literature and other sources

A review of scientific literature may be performed as part of the initial investigation or to gather information when preparing or designing additional studies. A literature search might be useful to:

  • identify previous studies that are relevant to the current task;
  • gather additional data that might be of use in supplementing existing data for the study;
  • develop a differential diagnosis list in a disease outbreak of unknown cause;
  • see how others have approached similar tasks; and
  • gather additional information to support your conclusions.

With widespread access to the internet and library services, searching for information is now relatively easy. Most search engines search using keywords that you enter and searches may be conducted against the title or abstract of a paper, authors or the entire content.

Searches can be refined by adding more terms and constructing logical search statements. Different search engines handle multiple terms differently, often using an 'advanced search' page to set detailed search parameters. In PubMed and Medline, terms can be combined in a search statement using AND and OR logical operators. For example: 'dogs and hepatitis'; '"johne's disease" or paratuberculosis'. If AND and OR operators are combined in one statement, the AND part will be processed first, then the OR, unless the OR is contained in parentheses.

Example: [cattle and Johne's disease or paratuberculosis] is different to [cattle and (Johne's disease or paratuberculosis)]. The first statement will retrieve all resources for Johne's disease in cattle or paratuberculosis in any species, while the second returns only resources relating to Johne's disease in cattle or paratuberculosis in cattle.

The ready availability of information via the internet means that often the bigger problem is not just finding information but finding those sources that are most relevant to your needs. It is important to compose and refine searches carefully, to make them highly specific for the desired topic. If this is not done, a large number of non-relevant articles are likely to be listed, making it very difficult to identify the important ones for closer scrutiny.

For example, a search on PubMed for "Johne's disease" returns more than 800 matches. By refining the search to find references about vaccines in cattle ("Johne's disease" and cattle and vaccine), this list can be reduced to less than 50. Additional terms can be added to further refine the search as necessary.

At the same time it is important not to get too specific, in case important papers have not been indexed on all the terms you have used.

Once a list of potential sources has been identified, selected items can usually be saved to a text file, or often to a reference manager. Abstracts of papers listed on PubMed and Medline are often available on-line free of charge, but copies of the full papers will usually need to be either purchased on-line, or obtained as downloads or photocopies through a library service (usually government agencies or universities).

A useful feature of Medline through Current Contents (for those with access to this service) is that it is possible to save regularly used searches for re-use or to be run on a weekly basis by the system, with new results each week forwarded as a text file to your email address. This feature is particularly useful if there are subject areas where you wish to stay abreast of the latest developments on an on-going basis.

Data collection

Once it is clear what is required, the next step is to plan for collection of information and data.

It is expected that most epidemiologic studies will focus on collecting raw data by making observations or measurements on animals or collecting samples for laboratory testing.

Data and information may also be obtained from talking to producers or other people, from the literature review, from government statistics on animal production and laboratory testing (including iSIKHNAS data).

Information gathered in looking at salmonellosis in sheep feedlots may come from a literature search, from which it is concluded that Salmonella is orally acquired and exposure dose is important - this may lead to identification of simple measures such as feeding in raised troughs which prevent faecal contamination of feed and ensuring good drainage to prevent slurry build-up. Alternatively, data might be available from feedlot and veterinary records, providing facts about cases (and non-cases) of salmonellosis that have occurred. This data would then need to be collated, summarised and interpreted to generate information from which to draw conclusions.

For an outbreak investigation, relevant data could include quantitative data on individual cases of disease, case histories on individual animals (both cases and non-cases) veterinarians' (or others) observations and impressions on cases, laboratory reports on testing undertaken on affected and unaffected animals, as well as potential sources of disease (such as samples of feed, water, soil and environment).

In other cases, the available data could comprise a series of paper files describing the issue of concern and providing relevant historical data. These files need to be read, collated and summarised to put the data into a form that can be easily understood and interpreted.

Entering, editing and analysing data

Once the relevant data are collected it is necessary to enter, collate and edit or check the data prior to any formal analyses.

Editing data

It is assumed that data are likely to be entered into either a spreadsheet or database for routine data management and preparation for subsequent analyses.

Where data are entered from paper records such as questionnaires or printed material, it may be useful to select a representative number of cells and check entered values against paper records as a form of quality assurance.

Organisation of data into tabular format in preparation for analyses

Organisation of data into tabular format.jpg

A variety of simple checks should be performed to try and detect errors and implausible or inconsistent values. Each column can be sorted and the top and bottom rows inspected to look for values that may be outliers or implausible (cow that weighs 45 kg).

Coding should be checked for all categorical variables. If sex is coded as M=male, MC=male castrate and F=female, then check to see if there are any cells with values that are not on this coding list. There may be cells entered as MF, male instead of M, heifer instead of female etc.

Often it is useful to examine two variables in combination for logic checks to detect problems like a female animal that is recorded as having been castrated, a non-pregnant animal recorded as having calved etc.

Developing the analytical approach

The design (way the data were collected) will inform the analytical approach. If the data were collected using a case-control study design then you should be planning to do 2x2 tables with odds ratios and possibly logistic regression to analyse the data. If the data were collected using a cross-sectional or cohort study design then you may be able to use 2x2 tables with relative risk measures and possibly other types of advanced analyses.

You should identify the outcomes of interest in the dataset, exposures or risk factors of interest and other variables that may be useful as confounders.

Complete descriptive or exploratory analyses. This may include simple summaries of the number of records in the dataset, description of each variable with coding system and type of data (continuous, ordinal, nominal or categorical), numbers of missing values in each variable, summary statistics for each variable (mean/median, counts by category etc), start and end date of data collection.

Simple analyses

2x2 tables (also called contingency tables or cross-tabulations) are simple and easy to perform and form the mainstay of initial analyses of field epidemiology data. Larger tables of attack rates can be produced for various risk factors.

Where data can be structured into a binary coding for disease (disease present, disease absent) and where the risk factor under consideration can also be classified using a binary approach (absent or present), then counts of the number of animals in each of these combinations can be entered into a 2x2 table and analysed to produce either odds ratio or relative risk estimates and associated confidence intervals and p-values.

Statistical tests provide a p-value (probability) that is interpreted as the likelihood of obtaining the results by chance alone if there was no association between risk factor and disease. Where the p-value is greater than a defined threshold, (alpha=0.05 or in some cases 0.1), we interpret the findings as not-significant and as indicating that the result could have occurred by chance alone and that the evidence does not support an association between the risk factor and disease. When the p-value is less than the threshold then we interpret the finding as significant and as indicating that there is an association between the risk factor and disease.

It is important to note that sometimes statistical tests return a non-significant finding even though the factor may be associated with disease. This is more likely to occur when the sample size is small. Small sample size alone should not interfere with the point estimate of an OR or RR though it may affect the confidence interval and the p-value.

From an epidemiological perspective, estimates and confidence intervals and unadjusted screening tests (relative risks or odds ratios) may be more useful than technically more advanced multivariable statistical analyses. This is particularly the case in the early stages of a complex disease investigation. Over time as more carefully planned studies are designed and implemented then advanced analyses may be more appropriate but these will generally take time to plan and perform.

It is also important to note that statistical significance or meaningful RR or OR estimates do not necessarily provide proof of causation. They provide evidence of statistical association. Where there is care and attention in the design of the study you may have more confidence in the causal interpretation of the results of statistical tests.

More advanced statistical analyses

Statistical advice should be sought before proceeding to more advanced analyses.

It is possible to perform stratified analysis of 2x2 tables using the Maentel-Haenszel procedure to adjust analysis for the confounding or modifying (interaction) effect of a different factor. The result may be an adjusted overall measure of association or separate measures of association for each level of the other factor.

Finally more advanced statistical models may be used to analyse larger or more complex datasets to produce adjusted measures of association between multiple factors in one model. There are a number of benefits of multivariable modelling in understanding associations between many factors and disease. The effects of any one factor in the model are adjusted for all other effects in the model, effects of interactions and confounding can be incorporated in the model and models can be expanded to incorporate dependencies amongst observations (clustering of units).

Logistic regression is very commonly used to analyse epidemiologic data from disease investigations where the disease outcome can be represented as a binary variable (0=no disease, 1=disease) and where multiple risk factors are being considered. Other types of analyses that may be considered include Poisson or negative binomial regression for count data (counts of the number of cases of disease), survival analysis for time to event (time to occurrence of disease) and possibly linear models when an outcome of interest is continuous (effect of disease status on body weight or growth).

Use of other information

In many cases, you could be asked to synthesise available information and make conclusions and recommendations with very little (or without any) quantitative data to analyse. In such situations the 'data' is likely to consist of paper files, case reports, subjective observations or other 'soft' data.

Qualitative data is not amenable to the numerical methods used to summarise and make inference from quantitative data. Instead, a qualitative analysis is required, following a series of systematic steps, such as:

  • thorough review and summarisation of the available material
  • identification of consistent patterns or anomalies in the data
  • identifications of strengths and limitations in the data
  • identification of likely and logical explanations for the observed patterns

It is usually not possible to make definitive statements about cause-and-effect or other specific relationships. However, particularly where the data is of a reasonable quality and consistency it is often possible to arrive at a conclusion as to the most likely explanation (or a group of likely/possible explanations) for the observed patterns.

Interpreting field data and information

Elevated relative risk or odds ratio estimates may provide suspicion about possible causes of disease but should be interpreted with caution. Chance, bias, confounding and other sources of error (data entry error, incorrect analyses etc) should all be considered as alternative explanations for elevated or significant measures of association.

Where the data were derived from a carefully planned study with design attributes intended to prevent bias and other problems and where the data management and analyses have been conducted appropriately and the results have produced significant measures of association with meaningful relative risk (or odds ratio) values, and where the findings are biologically plausible and consistent with other studies or findings, then you may have increased confidence in the findings.

In many cases you will be expected to draw conclusions and make recommendations based on less than perfect data/information. When this happens it is essential not only to recognise the limitations of the available data and information, but also to continue with those analyses that the data will support and draw what conclusions you can. In many cases, your recommendations are likely to include collection of additional data to provide further support (or otherwise) for your preliminary conclusions.

In 1994, an incident occurred in Queensland where a previously unidentified virus (since characterised as Hendra virus) was responsible for the death of 14 horses and one human (with a second affected human subsequently recovering), associated with a single racehorse stable ([#1 Baldock et al., 1996]). During the investigation it became rapidly apparent that this was a previously unidentified disease, and that the aetiology was unknown. However, even before the causal virus was identified, it was possible to determine that it was probably infectious in nature; was most likely to be directly transmitted; was not highly contagious (either among horses or humans); and that it probably originated from an, as then, unidentified wildlife reservoir ([#2 Baldock et al., 1995]). Just on one year after the Hendra outbreak, flying foxes (fruit bats) were identified as the presumptive natural host of the virus, with about 14% of flying foxes sampled being seropositive ([#1 Baldock et al., 1996]). The virus was subsequently isolated from uterine fluids of a flying fox ([#12 Halpin et al., 1996]). Flying foxes were known to feed in trees in a spelling paddock associated with the stable and in which the index case was grazing prior to becoming sick. The specific mechanism of transmission among bats and from bats to horses is still not known.

In the Hendra virus example, there was virtually no quantitative data available for analysis, and yet a remarkably accurate picture of what happened and the cause and source of the outbreak were generated by critical review and interpretation of the findings of medical and veterinary investigations of affected animals and humans.

Preparing the report

Effective communication of the findings of your investigation to the appropriate decision makers is critical. If the findings are not communicated in a manner that allows key stakeholders to understand the results and use the information to make good decisions, then the effort will have had little benefit.

A final report from any epidemiologic study should be prepared in a systematic manner with sections following scientific convention: introduction, outline of objectives, materials and methods, results, discussion and bibliography.

It is important to use a structured and systematic approach and always ensuring that the findings are consistent with the interpreted information and data available at the time. Describe and record your methods and findings so that any conclusions and recommendations are easily understood and the process of arriving at these conclusions is transparent and apparent to others. This is essential so that the basis and limitations of the conclusions are understood by those responsible for implementing any response to your recommendations.

References and other resources

Baldock, F. C., Douglas, I. C., Halpin, K., Field, H., Young, P. L. & Black, P. F. 1996. Epidemiological investigations into the 1994 equine morbillivirus outbreaks in Queensland, Australia. Singapore Veterinary Journal, 29:57-61.

Baldock, F. C., Pearse, B. H. G., Roberts, J., Black, P., Pitt, D. & Auer, D. 1995. Acute equine respiratory syndrome (AERS): The role of epidemiologists in the 1994 Brisbane outbreak. In: Proceedings of the Australian Association of Cattle Veterinarians. Melbourne, Australia. Brisbane, Australia: Australian Association of Cattle Veterinarians, 174-177.

Bradford-Hill, A. 1965. The environment and disease: Association or causation? Proc R Soc Med, 58:295-300.

Cleland, P. C., Chamnanpood, P. & Baldock, F. C. 1991. Investigating the epidemiology of foot and mouth disease in northern Thailand. In: DJ, K., ed. Epidemiology Workshop: Supplement to Epidemiology at work. Proceedings 176, September 1991. Tanunda SA. Sydney Australia: Postgraduate Committee in Veterinary Science University of Sydney, 17-32.

Cutler, S. J., Fooks, A. R. & van der Poel, W. H. M. 2010. Public health threat of new, re-emerging , and neglected zoonoses in the industrialised world. Emerging Infectious Diseases, 16, 1-7.

Dohoo, I., Martin, W. & Stryhn, H. 2010. Veterinary Epidemiologic Research, Charlottetown, Prince Edward Island, Canada, VER Inc.

Frerichs, R. R. 2001. History, maps and the internet: UCLA's John Snow site. Society of Cartographers Bulletin, 34:3-7.

Gregg M.B. 2002. Field Epidemiology, New York, New York, Oxford University Press.

Halpin, K., Young, P. & Field, H. 1996. Identification of likely natural hosts for equine morbillivirus. Communicable Diseases Intelligence, 20, 476.

Hueston, W. D. 2003. Science, politics and animal health policy: Epidemiology in action. Preventive Veterinary Medicine, 60: 3-13.

Kirkland, P. D., Love, R. J., Philbey, A. W., Ross, A. D., Davis, R. J. & Hart, K. G. 2001. Epidemiology and control of Manangle virus in pigs. Australian Veterinary Journal, 79:199-206.

Lessard, P. 1988. The characterization of disease outbreaks. The Veterinary Clinics of North America: Food Animal Practice, 4:17-32.

Love, R. J., Philbey, A. W., Kirkland, P. D., Davis, R. J., Morrissey, C. & Daniels, P. W. 2001. Reproductive disease and congenital malformations caused by Menangle virus in pigs. Australian Veterinary Journal, 79:192-198.

Martin, S. W., Meek, A. H. & Willeberg, P. 1987. Veterinary Epidemiology, Ames, Iowa, Iowa State University Press.

Mayer, D. (ed.) 2004. Essential Evidence Based Medicine, 2nd edition. Cambridge UK: Cambridge University Press.

Morris, R. S., Wilesmith, J. W., Stern, M. W., Sanson, R. L. & Stevenson, M. A. 2001. Predictive spatial modelling of alternative control strategies for the foot-and-mouth disease epidemic in Great Britain. Veterinary Record, 149:137-144.

Rothman, K. J., Greenland, S. & Lash, T. 2008. Modern Epidemiology, Philadelphia, PA, Lippincott, Williams, & Wilkins.

Snow, J. (ed.) 1855. On the mode of communication of cholera, London: John Churchill.

Stephen, C. & Ribble, C. S. 1996. Marine anemia in farmed chinook salmon (Onchorhynchis tshawytshca): Development of a working case definition. Preventive Veterinary Medicine, 25:259-269.

Thrusfield, M. 2005. Veterinary Epidemiology 3rd edition, Oxford, Blackwell Science.

Software resources

EpiCalc, 2000, v 1.2. A multi-function statistical calculator that works with pre-tabulated data. Available from:

EpiInfo, v 3.3. Database and statistics software for public health professionals. Available from:

EpiTools, 2004. AusVet's on-line epidemiological calculators and utilities. Available at:

WinEpiscope, v 2.0. Software for quantitative veterinary epidemiology. Available at:

Internet search engines and databases

Some of the commonly used, web-based, scientific databases include:

  • Medline/PubMed - This indexes all major medical, veterinary, epidemiological and associated journals, and is freely available for all users through PubMed (
  • Medline is also available through Current Contents and other service providers through institutional library subscriptions.
  • ScienceDirect ( provides indexing and search facilities for a wide variety of scientific journals in the physical, life, health and social sciences.
  • Biosis previews/Web of knowledge ( indexes a wide variety of journals, conference proceedings, books, review articles etc in the broad life sciences area. Available through institutional subscription. 
  • Sciverse Scopus ( claims to be the world's largest abstract and citation database of peer-reviewed literature and quality web sources, covering a multitude of topics. It is available through institutional subscription and some publishers provide temporary access to reviewers of journal papers.
  • Agricola ( is the catalogue of the National Agricultural Library of the USA and provides citations and abstracts for an extensive collection of agricultural literature.
  • CAB Abstracts (CAB Abstracts ) includes over 6.3 million records from 1973 onwards, with over 300,000 abstracts added each year, covering agriculture, environment, veterinary sciences, applied economics, food science and nutrition. Access is via institutional subscription or by time-based payment.
  • JSTOR ( indexes more than 1,000 refereed journals from a wide variety of disciplines, including aquatic, biological and health sciences and statistics. Available through institutional or individual subscription.
  • SIGLE (, or System for Information on Grey Literature in Europe, indexes more than 700,000 bibliographical references from the grey literature (research reports, doctoral dissertations, conference papers, official publications, and other types of non-refereed publications), produced in Europe. Open access to all users.

More general search engines include:

  • Scirus ( - This is a broader search engine covering a wide range of scientific information across disciplines and publication types. Scirus covers not only scientific journals, but also web publications and a range of other non-refereed sources.
  • Google Scholar ( also supports broad searches of the academic and scientific literature. It allows for searching across many disciplines and sources and ranks documents according to relevance and quality or frequency of citation. 
  • Google ( and other internet search engines can be used, but the content returned is not limited in any way other than by your search. These engines will return news items, personal web pages and any internet content that is relevant to the search criteria (and some that is not!).