There are numerous ways to collect data when conducting research. The study design depends greatly on the nature of the research question. There are two key areas of statistics that provide ways of collecting data, namely the idea of sampling and the use of controlled experiments. Samples are one kind of observational studies. When researchers have specific questions but no data to answer them, they must produce data either through observational studies or controlled experiments. In a previous post, the design of experiments was discussed. In this post, we discuss observational studies, in particular, the designs of three major types of observational studies, namely cross-sectional studies, case-control studies and cohort studies. Our presentation begins with three real-life examples and concludes with a general discussion.
Example 1 – Soda and Hypertension
Researchers from the School of Public Health at Imperial College in London studied the diets of nearly 2,700 middle aged people in the U.S. and the U.K. The focus of the study was on relationships between sugar and sugar-sweetened beverages (SSBs) to hypertension (a cardiovascular risk factor).
Data collected included four 24-hour dietary recalls, two 24-hour urine collections, eight BP readings and questionnaire data for 2,696 people ages 40 to 49 years of age from the U.S. and U.K.
Results of the Study
It was found that drinking at least one soda or other SSB a day increased blood pressure and that drinking more than one soda or SSB per day would result in even higher BP. One additional SSB intake (355 mL per day) was associated with an increase of systolic/diastolic blood pressure reading of 1.6/0.8.
Example 2 – Does living near power lines cause leukemia in children?
This is a study that was sponsored by The National Cancer Institute, which compared 638 children who had leukemia and 620 children who did not. The children who had leukemia were diagnosed before the age of 15. The children who did not have leukemia were individually matched to the leukemia participants according to place of residence, age and race. The abstract. The PDF file.
The goal was to determine the relationship between residential exposure to magnetic fields of the kind produced by power lines and leukemia status in children. The researchers measured the magnetic fields in the children’s bedrooms and in other rooms and at the front door. They also recorded information about power lines near the family home and also near where the family had lived during the mother’s pregnancy with the subject.
After 5 years of research, the study concluded that the risk of childhood leukemia was not linked to residential magnetic field levels.
Example 3 – The Framingham Heart Study
The Framingham Heart Study began in 1948 by following 5,209 men and women between the ages of 30 and 62 from the town of Framingham, Massachusetts, who had not yet developed overt symptoms of cardiovascular disease or suffered a heart attack or stroke. Since 1948, the study had added the descendants of the original participants and still continues. The goal of the study was to identify the common factors or characteristics that contribute to cardiovascular disease (CVD).
The Framingham Heart Study has followed CVD development over a long period of time in three generations of participants. Over the years, careful monitoring of these participants has led to the identification of major CVD risk factors such as high blood pressure, elevated blood triglyceride and cholesterol levels and smoking along with age and gender. Other areas of investigation include risk factors for physiological conditions such as dementia as well as the relationships between physical traits and genetic patterns.
The study described in Example 1 is a cross-sectional study. In this study, the exposure to sugar and sugar-sweetened beverages (SSBs) and the health status of hypertension were measured at one point in time in order to estimate the relationship between the exposure status and the health status.
In general, cross-sectional studies are used to measure an outcome of interest and a set of explanatory variables at one point in time in order to estimate the relationship between the outcome and the variables.
In a cross-sectional epidemiological study, disease status (the outcome of interest) and exposure status (the explanatory variables) are measured simultaneously in a representative sample from a population. The goal is to provide a snapshot of the frequency and characteristics of a disease in a population at one point in time. This type of studies can be used to assess the prevalence of acute and chronic conditions in a population.
One important point to make is that the measurements were done in a relatively short period of time. Thus it may not be possible to determine whether the exposure preceded or followed the disease. So this kind of studies may not provide clear evidence for cause and effect relationships.
Advantages of cross-sectional studies:
- Cross-sectional studies can be conducted at far less costs then experiments and other types of observational studies.
- Allow researchers to compare many different variables at the same time.
- Useful initial studies to generate hypotheses.
Disadvantages of cross-sectional studies:
- May not provide clear evidence for cause and effect relationships.
- Do not provide a clear natural progression of the development of the disease (do not know what happens before or after the snapshot is taken).
The study described in Example 2 is a case-control study. It compares the a group of children with leukemia (cases) with a comparison group of children of similar characteristics without leukemia (controls) in order to determine the relationship between residential exposure to magnetic fields of the kind produced by power lines and leukemia status in children.
In general, a case-control study is one in which the participants who have a disease or outcome of interest (called cases) are matched with participants with similar characteristics but who do not have the disease or outcome of interest (called controls). The goal is to determine the relationship between the exposure to some risk factors and the disease (or outcome of interest). This kind of studies is retrospective in nature, meaning that it looks back in time (e.g. reviewing medical history and lifestyle history of the participants) to learn what risk factors may be associated with the disease or the outcome of interest.
Instead of measuring risk factors, a case-control study can be used instead to determine the relationship between a set of protective factors and an outcome of interest. For example, a group of elderly women who have osteoporosis is matched with a similar group of elderly women who show no sign of osteoporosis. These participants are then interviewed to determine the relationship between diet (e.g. calcium intake) and osteoporosis.
As much as possible, the controls should be matched or paired appropriately with the cases in order to limit the effect of confounding variables.
Advantages of case-control studies:
- They are relatively inexpensive to conduct over a relatively short period of time.
- It does not require a large number of cases.
- They allow researchers to look at multiple risk factors (or protective factors).
- They are useful initial studies to establish association.
- These studies can avoid the ethical issue of subjecting participants exposure to harmful risk factors since the cases already have the disease.
- A great way to study rare diseases or diseases that take a long time to develop.
Disadvantages of case-control studies:
- Recall bias. Because these studies are retrospective, they rely on the participants to recall facts of the past. The participants who experienced harmful exposure to risk factors may be more motivated to recall the risk factors.
- May be difficult or impossible to validate the information obtained in interviews or questionnaires.
- It can be difficult to matched participants in the control group.
- Cannot calculate the rate of disease.
- Can only study one disease or outcome of interest.
- Cannot provide data on the development of the disease or outcome of interest.
- Control of confounding variables is usually incomplete.
The study described in Example 3 is a cohort study. In this study, the participants from several populations (the residents of Framingham, Massachusetts in 1940s and their descendants) were followed prospectively over time to determine which risk factors are associated with the development of cardiovascular disease (CVD).
In The Framingham Heart Study, the original cohort and the subsequent cohorts (the descendants) entered the study free of CVD. They were followed over a long period of time to provide data to establish the association between the long term exposure of certain risk factors and the development of CVD. Thus a cohort study such as The Framingham Heart Study can tell us what circumstances in early life are associated with the population’s characteristics in later life.
A cohort is a group of people who are linked in some way or who share a common characteristic within a defined period (e.g. people born in a given year or people with a certain health condition or people who live in the same city).
The key characteristic of a cohort study is that at the beginning of the study, the participants have not yet developed the disease or outcome of interest and they are followed over a long period of time to determine the association between the long term exposure of certain risk factors and the development of a disease or outcome of interest. Cohort studies can also be conducted retrospectively (e.g. from archived records).
Advantages of cohort studies:
- More powerful than other observational study designs (e.g. more likely to suggest cause-and-effect relationships).
- Can provide complete data on the participants’ exposure to the risk factors and experiences after the exposure.
- Better quality control of data.
- Can provide a clear picture of the natural progression of the development of the disease or outcome of interest because of the long-term nature of cohort studies.
- Can study multiple outcomes related to one specific type of exposure.
- Can calculate rate and risk of diseases.
Disadvantages of cohort studies:
- Expensive to carry out since a large number of participants is usually required and skilled staff is required.
- The disease or outcome of interest could take a long time to develop, usually many years and even decades.
- Circumstances of the participants may change during the study.
- May be difficult to maintain a high rate of retention of the participants.
If one think of a cross-sectional study as taking a snapshot at one point in time, then a cohort study is a series of snapshots taken over an extended period of time, usually years and even decades. Thus a cohort study is a superior observational methodology that can provide a clear picture of the natural progression of the development of the disease or outcome of interest.
Some comments with respect to case-control studies and cohort studies. It is worth noting that a cohort study starts out with a group of participants that are free of the disease of interest and observes the occurrence of the disease and the exposure statuses of the hypothesized risk factors. On the other hand, a case-control study starts out with documented cases of the disease of interest and investigates possible causes of the disease.
There is another inportant difference between case-control studies and cohort studies. In a case-control study, the cases already have the disease of interest. Thus case-control studies cannot provide information about the frequency of the disease in the subjects. Consequently, the incidence rate of the disease cannot be calculated. Instead, a case-control study is concerned with the frequency and the amount of exposure in the cases and in the controls.
On the other hand, cohort studies are concerned with the frequency of disease in the subjects (some are exposed to the studied risk factors and some are not). Thus data are available to calculate the incidence rate of disease.
When the goal in a statistical study is to understand cause and effect, experiments are the only way to obtain convincing evidence for causation. In experiments, researchers can manipulate one factor while controlling the others. Thus experimental method helps control the effect of lurking variables that may have an effect on the outcome being studied. Without being controlled, these confounding variables can distort the results of the study and can lead to false conclusion about cause and effect. In observational methodologies, information for these extraneous variables (if known by the investigators) is collected and quantitatively adjusted for. Because the control of the confounding variables or extraneous variables is less strict, results from the observational methodologies are generally less conclusive than those obtained from experiments. For this reason, randomized controlled designs are regarded as the gold standard for clinical trials. In general experimental designs sit at the top of the hierarchy of evidence in scientific research.
However, there are situations in which experiments are impractical or unethical to conduct. For example, the epidemiological studies that aim to determine the relationship between exposure to some harmful substance or risk factors and a disease cannot be conducted as experiments. Thus, observational studies have their place and can often provide the best source of information in the absence of experimental evidence.
In the three major types of observational studies we discuss, cohort studies typically provide the strongest evidence for cause and effect relationships. In general, the ranking of the strength of evidence in the three observational designs (from strongest to weakest) is: cohort study, case-control study and cross-sectional study.