RANDOMIZED CONTROLLED CLINICAL TRIALS IN REPRODUCTIVE HEALTH
J. Villar and *G. Carolli
Randomized controlled trials (RCT) are now well-accepted as the preferred means to evaluate clinical treatments, preventive-screening manoeuvres and health and educational interventions. In recent years, systematic reviews of randomized trials, generally presented as meta-analysis have been published in many areas of medicine, including obstetrics (5). This is a major development for our specialty, until recently known as one of the areas of medicine with less rigorous evaluation (12). The discussion here will be of relevant practical and methodological issues related to RCT using published examples from the obstetrical literature. A comprehensive discussion of RCT can be obtained from the text book of Meinert (17) or specifically oriented to reproductive health from Wingo et al. (25).
A recent large prospective study of the effect of physical demand and work during gestation on pregnancy outcome, demonstrated a linear increase in the incidence of small-for-gestational-age (SGA) infants as the physical demand of the mother increases (16). However, it is well known that work (at home or outside) during pregnancy is also independently associated with other maternal characteristics, such as family income, woman’s age, nutritional status and obstetrical history. Furthermore, these maternal characteristics are also known to be independently associated with several pregnancy outcomes, including the rate of SGA.
Therefore, there are characteristics (variables) of these mothers that are related to the two factors of interest in this study: the level of their work and the outcome of the pregnancies. In general, these variables are associated with the disease under study and with the exposure (of the subjects) to some agent or condition. In the evaluation of medical treatment, these variables are associated independently with the treatment under study and the disease to be treated or prevented. These characteristics are considered confounding variables and they falsely obscure or accentuate the relationship between the treatment or exposure and the disease under study (4,15,17). The effect produced by confounding variables should be prevented in the study design or controlled for during the analysis.
In the previous example, family income, maternal age, nutritional status and obstetrical history are possible confounding variables to the relationship between work and pregnancy outcome. For example, women who work in offices had lower rate of small-for-gestational-age infants than those who worked in " manual " activities (16.8% versus 20.8%), but they also had higher income, were taller and had fewer previous low birth weight infants (Table 1). All these factors are associated with both the maternal working conditions (office/ " manual ") and the incidence of SGA; they are confounding variables.
A distinction should be made between two terms that can have similar meaning: confounding variables and selection bias. The most important source of bias in a study evaluating a treatment modality is introduced in the selection of participants for alternative forms of treatment. The distinction between selection bias and a confounding variable is related to the possibility of the selection bias to be controlled in the analysis. If the " cause " or the source of selection bias is known and can be controlled in the analysis, the problem is similar to a confounding variable, because it is at least theoretically removable. Unfortunately, in most of the cases, the selection bias cannot be measured or is very difficult to identify.
The control of confounding variables
The effect of confounding variables can be: (a) prevented, (b) assessed or (c) controlled during different stages of a research project (19). The objective of preventing the confounding effect is to obtain study groups that are similar at the beginning of the treatments in the distribution of the possible confounding variables. Randomization during experimental studies and matching and/or restriction to those individuals without the confounding factors in experimental and non-experimental studies are the available strategies for prevention.
The assessment of the confounding effect in experimental and non-experimental studies is performed by comparing the crude estimate of the association between the risk factor or the treatment and the outcome with the estimate after the confounding effect is removed. The discrepancy between these two estimates gives the magnitude of the confounding effect. " P-value " analysis does not have a place in this process.
An example of the comparison between the crude estimate and the effect after removing the confounding factor is presented in Table 2. The association between ponderal index of newborns (weight/length3) and two indicators of neonatal morbidity were studied in a large prospective study (23). The crude analysis demonstrates that newborns with low ponderal index (underweight for length) have 5.6 times more risk of perinatal distress and 3.0 times more risk of low Apgar score at five minutes than those newborns with normal ponderal index (normal weight for length).
However, birth weight is known to be associated with both neonatal outcomes as well as with ponderal index, for it is part of the calculation. Therefore, an adjusted statistical analysis removing the effect of birth weight was also conducted. As can be seen in the right column of Table 2, after removing the effect of birth weight, the risk of perinatal distress and low Apgar score is reduced for newborns with low ponderal index. The discrepancy between these two estimates gives the magnitude of the confounding effect of birth weight in the association between ponderal index and neonatal morbidity.
Finally, the control of confounding variables can be done in experimental and non-experimental studies using analytic and statistical strategies such as stratified analysis and multivariable analysis. A detailed description of these methods can be found in most standard epidemiological and statistical textbooks and will not be reviewed here.
It should be remembered that all these processes must be considered well in advance of the implementation of the study, in the context of a well-defined hypothesis. In this paper we will review the randomization process and the implementation of randomized controlled trials as a " preventive " measure for confounding effects and selection bias.
This is a process in experimental studies in which study subjects are assigned to treatment or control groups randomly. The purpose of this method is to reduce selection bias and prevent confounding. Subjects should have the same probability of being included in the treatment(s) or control(s) groups.
Randomization is a powerful method for preventing confounding, as it is the only one which can control for known and unknown risk factors or confounding variables; if the process of subject allocation is performed correctly, randomization is resistant to external manipulation. Thus, it is expected that known and unknown prognostic variables are randomly distributed between study groups and that randomization will create study groups having similar incidence of the disease or outcome when the treatment under evaluation has no effect. Table 3 summarizes the main properties of randomization.
TABLE 3. Properties of randomization.
Randomization is less effective in achieving these objectives in studies of small sample size; in most of these cases, variables known to be confounders should also be controlled for in the analysis of these trials. For example, a small clinical trial was conducted by our group to evaluate the effect of calcium supplementation during pregnancy on the rate of blood pressure increase during the third trimester of gestation (21). Twenty-seven patients were randomized to the placebo and 25 to the calcium supplemented group. Despite the randomization, the diastolic blood pressure at the beginning of supplementation (24th week of gestation) was 53.4 + 11.2 mm Hg in placebo group, lower than the value in the calcium group (56.5 + 8.8 mm Hg).
Blood pressure during early pregnancy is independently associated with blood pressure at term, as well as possibly being associated with the response of blood pressure to the supplementation. A decision was made that initial diastolic blood pressure should be controlled for in all statistical analyses of this randomized controlled trial (21).
Randomized controlled trials
When are RCTs necessary and appropriate? This type of study design is needed to evaluate treatments or interventions for important clinical diseases or for those cases in which there are uncertainties regarding the effectiveness of available treatments or forms of care. The anticipated effect of the new intervention on the selected outcomes is generally moderate. Thus, most of the new treatments for chronic disease, obstetrical and gynecological morbidities or screening methods during pregnancy qualify for this study design. RCTs are not limited only to medical or pharmacological treatments, as psychological and social interventions should be also rigorously evaluated (6,14).
Randomized controlled trials are not indicated in the initial stages of the evaluation of the etiology of a disease or the when prognosis or short or long-term consequences of an exposure or condition are under investigation. Nevertheless, RCT can play an important role in the investigation of a disease’s etiology when clinical research or observational studies are not sufficient. For example, after more than a decade of an iatrogenic epidemic of blindness among premature infants, the role of high oxygen therapy was tested in a large multicentre randomized trial, and the unequivocal results of this trial ended the epidemic (10).
As randomization is a method to prevent confounding, RCTs could be considered unnecessary: (a) in those cases in which all confounding variables are known; (b) when the prognosis of the condition under treatment is also certainly known; and (c) when the expected treatment effect is very large. However, caution should be exercised when deciding against a RCT to evaluate an intervention or treatment, even when the theory behind the effect of such intervention appears to be logically sound.
The randomization process
In most RCTs, the basic subjects of randomization are individuals or patients who are allocated to a placebo or a new or established form of care. The research unit in charge of the randomization process can be centrally located (clinicians communicate with it by fax, telex or telephone) or at the clinic/hospital level. Once an eligible subject agrees to participate in the study, she is randomized and treatment starts. It is very important to reduce the time between randomization and entry of patients to the intervention/ control regimen.
The selection of the randomization method and who will implement it depends on the circumstances of the research project; however, there are important issues to consider when selecting the method: the process must be (a) formal, (b) unpredictable, (c) reproducible, (e) secure and (d) have mathematical properties. As tossing a coin is not reproducible, it is to be discouraged. The most important characteristic is that the treatment allocation of the next patient is unpredictable. For example, systematic schedules such as every other patient and the days of the week do not fulfill this requirement and should be avoided. Furthermore, the treatment schedule should be unknown to all members of the research team until needed for the initiation of the treatment (in unblinded studies), and should be completely masked until the completion of the trial in blinded studies. Thus, birthdate, social security number, and odds-even schemes should also be avoided.
Table 4 lists several randomization methods used in the literature from the less rigorous (*) to the most bias-safe alternatives (*****). Researchers should select methods based on the above-listed characteristics and adopt the one most suitable to their needs, without compromising the quality of the process. It should be kept in mind that this is the cornerstone of the study and even when tedious, the best methodological option should be selected.
A detailed review of the statistical properties of several mechanisms of randomization have been extensively discussed; the conclusions and recommendations were recently published (13). Other recommended reading includes a very useful summary that was prepared for the British Medical Journal (8).
It is often asked whether or not the randomization process has worked. The evaluation of the process should be primarily conducted by continuously monitoring during the study the methods underlying it and being satisfied that no bias has been introduced. Performing statistical calculation leading to a " p-value " during the analysis does not respond to the question. For example, a large difference between treatment groups in the distribution of a baseline variable with a p-value >0.05 does not necessarily mean a random assignment. Small differences in baseline variables but with a p-value <0.05 does not mean that the randomization process did not work or that this variable is a confounding factor.
The study design of the randomized clinical trial is an important point in the avoidance of moderate bias and random error. The statement of the objective should be clearly specified, including the anticipated effect of the principal measure of outcome. The expected treatment effect is crucial to the calculation of the sample size.
Fig. 1 presents the flow chart of a large RCT recently completed by our group, evaluating an intervention of psychosocial support during pregnancy on birth weight, gestational age and maternal health (14,24). Possible candidates for enrollment in the study were screened using a list of inclusion and exclusion criteria. Complete baseline information was obtained from those who were eligible for the study, but basic descriptive information was also obtained from those women excluded before randomization. These data were used for the evaluation of the representativeness of the study population.
The time of individual randomization (about 22nd week of gestation in the example), represents the crucial point of the study. Subjects randomized cannot be excluded from the final analysis and should be part of the study population regardless of any follow-up experience.
Subjects excluded before randomization affect the composition of the study population (external validity) and the generalization of the study results. Homogeneous study populations, selected using restricted entry criteria, will have less confounding variables (these were excluded), but the study will lose generalization of its results. Subjects lost after randomization affect the comparability of the groups or treatments, the main objective of the study (internal validity).
The following points should also be considered:
Table 5 presents a practical checklist for items to be included in the description of the study design (usually included in the Materials and Methods of published articles).
Randomization does not necessarily produce comparable groups. There can be minor and sometimes major differences in the baseline comparisons between groups (1). These baseline differences between groups should always be evaluated, but not using a statistical test (19). Descriptive statistics such as standard deviation, range and selected centils, as well as the mean or median, should always be used to evaluate the distribution of baseline variables. Standard errors and confidence intervals are not descriptive measures and should not be used for this purpose. Unfortunately, a recent evaluation of 80 reports of randomized clinical trials published in four leading general medical journals indicates that in almost half of trials, the presentation of baseline data was unsatisfactory (1).
Table 6 presents the baseline characteristic of the two groups in the psychological support during pregnancy study (24). Overall, the two groups had similar demographic, obstetrical and psychological characteristics at baseline. Note that no statistical comparisons are presented. Although the baseline distribution of psychological and social support characteristics were very similar between groups (see last five variables in Table 6), stratified analysis was conducted by the last two summary variables because of their possible important role in the effect of the intervention (see Table 5 of ).
Table 7 presents descriptive information for patients enrolled in a randomized controlled trial designed to evaluate the effectiveness of 200 mg of doxycycline given orally at the time of IUD insertion in reducing the incidence of pelvic inflammatory disease. Groups had very similar characteristics, although the mean number of previous live births and the years of education had p values <0.05. The differences, however, do not have any biological importance (25) and should not be considered as imbalance among groups, despite the low p-value.
If the randomization was correctly conducted (unbiased), it is expected that by chance 5% of the baseline comparisons performed using statistical test will have a p-value <0.05. For example, in 600 hypothesis tests performed in 46 published trials for baseline comparisons, 24 (4%) were statistically significant at the 5% level. Statistical tests were performed for 17 comparisons between baseline characteristics of two study groups in a randomized controlled trial of the effect of calcium supplementation on preterm delivery (23). Among these comparisons, maternal weight was lower (63.0 + 10.6 kg) in the calcium group than in the placebo group (67.4 kg + 13.5 kg) (p<0.01) (Table 8). This statistical difference can be due to chance; we expect that at a statistically significant level of 5% (p<0.05), approximately one of the 17 comparisons would be significant. The decision which must be made by the investigators is whether this difference is biologically important to independently influence the effect of calcium supplementation and the rate of preterm delivery. In this example, the authors considered that maternal weight differences of this magnitude should not be controlled for in the analysis, and that if any effect was present, it would be in the direction of increasing prematurity in the calcium group. Thus, it is clearly a biological decision and not statistical.
Evaluation of the impact of the treatment
As previously stated, if there is no treatment effect (the new drug and placebo/old drugs are not different in improving the main outcome of interest) and if the randomization is well-conducted with an adequate sample size, the incidence of the main outcome should be similar in both groups.
A simple comparison among incidence rates in both groups, rate ratio (the ratio of the two incidence rates), and confidence intervals of the rate ratios should be sufficient to present the results. This comparison should include all subjects that were originally randomized, regardless if they completed the treatment or any follow-up experience. For the example of the psychosocial intervention during pregnancy study, Table 9 presents the standard format for the presentation of the main crude results.
However, it could be possible that baseline differences between groups are detected and that these characteristics are considered as possible confounding variables. Thus, an unconfounded evaluation of the response to the treatment is desirable. Four analytic strategies can be used to obtain this unconfounded evaluation:
Finally, it is important to keep the number of analyses to a minimum, specifically to those that were originally planned. False positive results are often due to multiple analyses. Secondary analyses should also be limited and considered only hypothesis-generating, not hypothesis testing.
The problem of small trials
Small trials of insufficient number of cases are a very common practice in clinical research. Two recent reviews demonstrated that up to 2/3 of all published reports in leading medical journals did not provide power calculations or did not give reasons for termination of subjects’ recruitment (1,18). True differences observed between treatment groups are more likely to be attributed to chance in small trials (false negative results). Furthermore, statistically significant differences observed in small trials tend to overestimate true biological differences. In other words, when applied to other populations, the treatment effect should be expected to be less dramatic than the one observed in the original small trial.
We think that all these considerations should be carefully reviewed during the planning or evaluation of a randomized controlled trial. As a more general recommendation, we offer this quote from J. Cornfield: " On being asked to talk on the principles of research, my first thought was to rise after the chairman’s introduction, to say ‘be careful’ and to sit down. " (7).
Edited by Aldo Campana,