Korean J Orthod 2024; 54(6): 374-391 https://doi.org/10.4041/kjod24.051
First Published Date July 26, 2024, Publication Date November 25, 2024
Copyright © The Korean Association of Orthodontists.
Samer Mheissena , Haris Khanb , Mays Aldandanc , Despina Koletsid,e
aPrivate Practice, Damascus, Syria
bCMH Institute of Dentistry Lahore, National University of Medical Sciences, Lahore, Pakistan
cPrivate Practice, Daraa, Syria
dClinic of Orthodontics and Pediatric Dentistry, Center of Dental Medicine, University of Zurich, Zurich, Switzerland
eMeta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA, USA
Correspondence to:Samer Mheissen.
Specialist Orthodontist, Private Practice, Damascus 00963, Syria.
Tel +963-15833179 e-mail Mheissen@yahoo.com
How to cite this article: Mheissen S, Khan H, Aldandan M, Koletsi D. Unaccounted clustering assumptions still compromise inferences in cluster randomized trials in orthodontic research. Korean J Orthod 2024;54(6):374-391. https://doi.org/10.4041/kjod24.051
Objective: This meta-epidemiological study aimed to determine whether optimal sample size calculation was applied in orthodontic cluster randomized trials (CRTs). Methods: Orthodontic randomized clinical trials with a cluster design, published between January 1, 2017 to December 31, 2023, in leading orthodontic journals were sourced. Study selection was undertaken by two independent authors. The study characteristics and variables required for sample size calculation were also extracted by the authors. The design effect for each trial was calculated using an intra-cluster correlation coefficient of 0.1 and the number of teeth in each cluster to recalculate the sample size. Descriptive statistics for the study characteristics, summary values for the design effect, and sample sizes were provided. Results: One-hundred and five CRTs were deemed eligible for inclusion. Of these, 100 reported sample size calculation. Nine CRTs (9.0%) did not report any effect measures for the sample size calculation, and a few did not report any power assumptions or significance levels or thresholds. Regarding the specific variables for the cluster design, only one CRT reported a design effect and adjusted the sample size accordingly. Recalculations indicated that the sample size of orthodontic CRTs should be increased by a median of 50% to maintain the same statistical power and significance level. Conclusions: Sample size calculations in orthodontic cluster trials were suboptimal. Greater awareness of the cluster design and variables is required to calculate the sample size adequately, to reduce the practice of underpowered studies.
Keywords: Cluster, Trials, Orthodontic, Cluster randomized trials
Randomized controlled trials (RCTs) are considered the cornerstone of evidence-based practice, serving as the gold standard for evaluating the effectiveness and/or safety of an intervention. A key feature of RCTs is the presence of an untreated control group followed up in parallel with the intervention group. In a simple parallel-arm design, the randomization is implemented at the participant level; the number of analyzed units equals that of the randomized units.1 However, variations in design may lead to differences between analyzed and randomized units.2 For example, researchers may randomize a group of individuals rather than one individual to receive an intervention.2-4 These groups, known as clusters, can include families, schools, villages, or dental practices. Cluster design has gained substantial interest in orthodontics and dentistry, as roughly one-quarter of published orthodontic trials5 and dental trials6 structured as cluster designs, where a group of teeth from each participant receives the same intervention as a unit of the cluster.
Sample size calculation is a fundamental step in RCTs to determine the appropriate number of patients during the design stage of a clinical trial. This calculation helps substantiate the importance, significance, and clinical relevance of the identified treatment effect. Large RCTs might unnecessarily expose patients to potentially ineffective or harmful treatments, which may be unethical or resource consuming.7 Conversely, small RCTs may lack sufficient statistical power to detect clinically meaningful differences between interventions.7
Reporting the sample size calculation is required at early stage of the study protocol to support transparency, credibility, and reproducibility of research findings. Key components for calculating sample size include the type I error (typically set at 0.05, or sometimes 0.01), power (usually 80–90%), and the assumptions of the expected difference in estimates for both the control and treatment groups, along with relevant effect sizes that justify a clinically meaningful difference.
Variations in trial design may require specific sample size calculation considerations and assumptions.8 For instance, in cluster randomized trials (CRTs), observations are correlated, whereas standard parallel-arm RCTs assume these observations are independent. In CRTs, the correlation is generally determined by two parameters: the intra-cluster correlation coefficient (ICC; ρ) and the between-cluster coefficient of variation (k). Consequently, each individual in the cluster contributes less than one independent individual, resulting in less unique information per participant and, therefore, reduced power.9,10 Thus, the sample size calculation for CRTs must be adjusted for clustering by increasing the sample size using the design effect.
D = 1 + (m–1)ρ
Where “D” is the design effect, “m” is the number of individuals per cluster, and ρ is the ICC. In orthodontics, if multiple teeth receive the same intervention and contribute to the outcome, “m” would equal the number of teeth involved from each participant.
For example, consider a trial assessing the white spot lesions formation using two different bracket systems, A and B. To detect a meaningful difference between the two groups based on 80% power and 5% type I error, 200 teeth (10 patients with 20 teeth per patient) are required in each group, while assuming independence between teeth. However, if we consider the number of teeth per patient (m = 20) and assume the ICC (ρ) of 0.1, the design effect would be D = 1 + (20–1) × 0.1 = 2.9, and the required number would be increased to 580 teeth per group, approximately 29 patients per group. However, assuming the ICC (ρ) is 0.2, the design effect would be D = 1 + (20–1) × 0.2 = 4.8, increasing the required number to 960 teeth per group, approximately 48 patients per group.
Previous studies assessed the adequacy of sample size calculation and found that the sufficiency and correctness of these calculations ranged from 7.3% to 35.6% in dental research11 and 29.5% in orthodontic trials.12 Regarding the variations in trial designs, a previous assessment8 investigated the sample size calculation in longitudinal trials and concluded that most calculations were suboptimal. However, to date, no study has yet assessed the correctness of sample size calculations and their requirements in CRTs specifically. Therefore, the current study aimed to assess the correctness of sample size calculation in orthodontic CRTs and provide a range of miscalculation amount by estimating the expected increase in sample size using the design effect.
Studies were included if they met the following criteria: (1) RCTs of cluster design, multiple teeth or mini-implants within the same patient received an intervention and contributed to outcome measures. (2) Published between January 1, 2017 to December 31, 2023. (3) Published in one of the following six major orthodontic journals (2023): European Journal of Orthodontics, the Angle Orthodontist, American Journal of Orthodontics and Dentofacial Orthopedics, Progress in Orthodontics, Orthodontics & Craniofacial Research, and the Korean Journal of Orthodontics.
Animal and preclinical studies were excluded. Studies with no clear details regarding cluster design and studies with designs other than clinical trials were also excluded.
An electronic search of MEDLINE via the PubMed database was undertaken by one author (SM), with the latest update on February 7, 2024, using text words and medical subject headings (Appendix 1). Records irrelevant to the eligible journals were removed, and two authors (SM, HK) performed the initial screening of the studies independently and in duplicate. Trials with interventions involving more than one tooth/mini-implant per patient were included in the full-text review. The same two investigators scrutinized the full texts of potentially eligible articles and evaluated them against the inclusion criteria. In the presence of any disagreement, a consensus was reached after discussion between the two authors.
Two authors independently extracted the following study characteristics: number of authors, continent of the first author (Europe, Americas, or Asia and others), journal and year of publication, study design (parallel, split-mouth, or crossover), and number of arms. The variables required for the sample size calculation were extracted by a single author (SM) after calibration with another author (MA) and entered into an Excel file (Microsoft, Redmond, WA, USA) equipped with the equation to calculate the design effect for each study based on ICC (ρ = 0.1). This value was lower than the reported value (0.2) in previous orthodontic13 and dental studies,14,15 and was selected as a more conservative approach because of the lack of a common ICC for different orthodontic outcomes. The value of “m” was calculated for each study based on the number of teeth or min-implants contributing to each patient unit. Finally, the required number of patients for CRTs was recalculated by multiplying the design effect with the number calculated by the authors of the original CRT publications, as described previously. For each CRT, the increase in the sample size was divided by the number calculated by the authors of the original publication to provide the percentage of the required increase in the number of participants to maintain the same statistical power.
Descriptive statistics were provided for the included studies using the median and interquartile range (IQR). The associations between calculating the sample size in CRTs (using the appropriate/optimal approach) and study characteristics were planned to be examined using statistical testing. However, this was not feasible, as only one trial reported the design effect and performed an optimal calculation of the CRT sample size. Five CRTs were excluded from reporting and recalculating the sample size due to the lack of details regarding the sample size. Sensitivity analysis was conducted to isolate the effect of the simple parallel design on the recalculation of the sample size. All statistical analyses were conducted using Stata 15.1 (Stata Corp., College Station, TX, USA) and R statistical package (version 4.3.0; R Foundation for Statistical Computing, Vienna, Austria).
Following the inclusion of the aforementioned journals, 323 articles were screened. One hundred and fifty-one articles were excluded after reading the title and abstract, and 67 articles were excluded after full-text reading for various reasons (Appendix 2). One hundred and five CRTs were eligible for inclusion and data extraction (Figure 1).
Within this cohort, 100 CRTs (95.0%) reported a sample size calculation, with a median of four participating authors (IQR: 3–6), mostly originating from Europe (48/105; 45.7%). Most CRTs were single-center trials (99/105; 94.3%) with a parallel design (76/105; 72.4%) and two arms (84/105; 80.0%). More than half (58/105; 55.2%) had prior protocol registration, while approximately one-third (33/105; 31.4%) lacked optimal reporting of protocol registration (Table 1).
Table 1 . Characteristics of included cluster randomized trials according to whether sample size calculations were reported
Characteristic | Overall (n = 105) | No (n = 5) | Yes (n = 100) |
---|---|---|---|
Authors’ number | 4 (3, 6) | 4 (2, 5) | 4 (3, 6) |
Continent | |||
Americas | 18 (17.1) | 0 (0) | 18 (18.0) |
Asia/others | 39 (37.1) | 2 (40.0) | 37 (37.0) |
Europe | 48 (45.7) | 3 (60.0) | 45 (45.0) |
Journal/book | |||
AJODO | 31 (29.5) | 2 (40.0) | 29 (29.0) |
AO | 35 (33.3) | 1 (20.0) | 34 (34.0) |
EJO | 27 (25.7) | 2 (40.0) | 25 (25.0) |
KJO | 5 (4.8) | 0 (0) | 5 (5.0) |
OCR | 1 (1.0) | 0 (0) | 1 (1.0) |
PIO | 6 (5.7) | 0 (0) | 6 (6.0) |
Publication year | |||
2017 | 8 (7.6) | 2 (40.0) | 6 (6.0) |
2018 | 18 (17.1) | 3 (60.0) | 15 (15.0) |
2019 | 13 (12.4) | 0 (0) | 13 (13.0) |
2020 | 15 (14.3) | 0 (0) | 15 (15.0) |
2021 | 20 (19.0) | 0 (0) | 20 (20.0) |
2022 | 14 (13.3) | 0 (0) | 14 (14.0) |
2023 | 17 (16.2) | 0 (0) | 17 (17.0) |
Centers | |||
Multi | 6 (5.7) | 1 (20.0) | 5 (5.0) |
Single | 99 (94.3) | 4 (80.0) | 95 (95.0) |
Number of arms | |||
2 | 84 (80.0) | 3 (60.0) | 81 (81.0) |
3 | 15 (14.3) | 2 (40.0) | 13 (13.0) |
4 | 6 (5.7) | 0 (0) | 6 (6.0) |
Design | |||
Crossover | 2 (1.9) | 0 (0) | 2 (2.0) |
Parallel | 76 (72.4) | 4 (80.0) | 72 (72.0) |
Split mouth | 27 (25.7) | 1 (20.0) | 26 (26.0) |
Protocol registration | |||
Yes | 58 (55.2) | 1 (20.0) | 57 (57.0) |
No | 14 (13.3) | 0 (0) | 14 (14.0) |
Not reported | 33 (31.4) | 4 (80.0) | 29 (29.0) |
Values are presented as median (interquartile range) or number (%).
AJODO, American Journal of Orthodontics and Dentofacial Orthopedics; AO, The Angle Orthodontist; EJO, European Journal of Orthodontics; KJO, Korean Journal of Orthodontics; OCR, Orthodontics & Craniofacial Research; PIO, Progress in Orthodontics.
Of the included CRTs that reported the sample size calculation, 31 of 100 (31.0%) based the calculation on effect size, 44 of 100 (44.0%) reported the mean difference, and 9.0% did not report any effect measure. More than half of the included CRTs opted for 80% power to calculate the sample size, whereas a few CRTs did not report the power assumptions at all (2.0%). The vast majority of the included CRTs used the value 0.05 for alpha (type I error) to estimate the sample size, while a few CRTs (8.0%) did not report a significance level (Table 2). Only one included CRT13 reported the design effect and adjusted the sample size accordingly.
Table 2 . Reporting of sample size calculation in cluster randomized trials when it was feasible
Item | n = 100 |
---|---|
Effect measure | |
Effect size | 31 (31.0) |
Mean difference | 44 (44.0) |
Relative risk reduction | 4 (4.0) |
Risk difference | 12 (12.0) |
ni | 9 (9.0) |
Value of the effect measure | |
Effect size | 0.50 (0.43, 0.80) |
Mean difference | 1.04 (0.50, 2.00) |
Relative risk reduction | 0.15 (0.08, 0.20) |
Risk difference | 0.25 (0.20, 0.66) |
Level of significance (α) | |
0.001 | 1 (1.0) |
0.01 | 3 (3.0) |
0.0125 | 1 (1.0) |
0.025 | 1 (1.0) |
0.05 | 86 (86.0) |
Not reported | 8 (8.0) |
Power | |
80% | 60 (60.0) |
81–85% | 11 (11.0) |
90% | 19 (19.0) |
> 90% | 8 (8.0) |
Not reported | 2 (2.0) |
Accounting for cluster effect | |
Yes | 1 (1.0) |
No | 99 (99.0) |
ICC | |
None | 100 (100.0) |
Values are presented as number (%) or median (interquartile range).
ICC, intra-cluster correlation coefficient; ni, no information.
Table 3 lists the parameters used to recalculate the sample size. The median number of participants was 67.6 after recalculation, which was greater than the median number of participants provided by the included papers (40 participants). This can be interpreted as follows: the median increase in the sample size was 50% (IQR: 30%, 90%) based on the number of teeth in each cluster when the value of 0.1 was used as the ICC, maintaining the same power and level of statistical significance (Figure 2). A sensitivity analysis based solely on 72 parallel studies yielded similar results, with a median increase in sample size of 50% (IQR: 30%, 120%).
Table 3 . Recalculation of sample size and sensitivity analysis for CRT with parallel design
Re-calculation (100 CRTs) | Sensitivity analysis (72 CRTs) | |
---|---|---|
Design effect | 1.5 (1.3, 1.9) | 1.5 (1.3, 2.2) |
Number of individuals per cluster | 6 (4, 10) | 6 (4, 13) |
Number of clusters | 18.5 (12.5, 27.0) | 18.0 (14.0, 24.5) |
Sample size in the paper | 40 (26.5, 59.0) | 40 (30.0, 57.5) |
Number of required participants | 67.6 (36.2, 108.0) | 68.5 (36.9, 114.0) |
Percentage | 50% (30%, 90%) | 50% (30%, 120%) |
Values are presented as median (interquartile range).
CRTs, cluster randomized trials.
The present study confirmed a miscalculation of the expected sample size in orthodontic CRTs published in the last 7 years, with more than 50 percent underestimation of the actual sample size requirements being a typical flaw.
Cluster design is frequently encountered in orthodontic and dental RCTs5,6 due to the fact that several teeth from the same individual are allocated to an intervention in the trial and constitute subunits of the patient-cluster. Consequently, the unique information obtained from cluster data is less than that obtained from independent data, thus requiring a mandatory increase in sample size to compensate for the clustering effect in CRTs.16 The design effect, which is a typical correction factor for the required adjustment in sample size calculations in CRTs, was rarely reported in the present sample. This raises concerns about whether cluster design is actually being employed in orthodontics, and reflects a lack of awareness of potential clustering effects in orthodontic RCTs,5 starting from sample size assumptions. Ignoring the data structure and the correlation arising from multiple measurements was also evident in longitudinal and repeated-measure design in orthodontics.8 Of the 147 included trials, no single study reported an optimal calculation. A recent empirical report that examined clustering effects across all types of studies published in three orthodontic journals over a 3-year period reported that only one-fifth to one-fourth of published research of any kind accounted for clustering effects in sample size calculations. However, no attempt at recalculation was made, nor were CRTs explicitly assessed; thus, further direct comparisons with this report cannot be made.17 In contrast, a previous healthcare report found that elements specific to CRTs were the worst reported when calculating the sample size, whereas only 22% reported all recommended elements.18 Similarly, in dentistry and orthodontics, it is still difficult to handle participants as clusters in specific cases, or there is a lack of understanding of the theoretical and scientific background of the different structures of study designs.
Accurate and transparent reporting of sample size calculation is essential for RCTs according to the Consolidated Standards of Reporting Trials (CONSORT) group.19 One might argue that a significant improvement could be confirmed in this assessment compared with a study undertaken 10 years ago regarding sample size calculation in orthodontic RCTs.20 This early study found a lack of complete reporting of the sample size components in 70% of the included RCTs, while this was less than 10% in our assessment of CRTs. This should be interpreted with caution owing to the inclusion of only one specific design in the present study. However, cluster trial reporting requires more details and information related to the number of clusters, the cluster size (usually the number of teeth in orthodontics), and the ICC according to the CONSORT extension for cluster design.21 A previous study22 found that journals promoting CONSORT adherence are associated with superior reporting of RCTs. However, a survey23 found that only 12 of 165 high-impact journals mentioned the extension to cluster trials in their online instructions for authors. Thus, more rigorous editorial policies regarding CONSORT extensions are required to bring substantial improvement to CRT reporting.
It is worth mentioning that a higher ICC value or number of teeth per cluster (m), requires an increase in the sample size to maintain the same power of the study. Failing to increase the sample size may lead to an underpowered trial, as increasing the power from 50% to 80% would require a two-fold increase in the trial size.24 The present study found that the number of participants in orthodontic CRTs should be increased by a median of 50% to maintain the same statistical power. This was also confirmed when we focused on the simplest design of the randomized trials assessed, the parallel-arm design, to avoid any effects from the more complex structures encountered, which would potentially involve the evaluation of additional parameters, further implicating between cluster variability issues. Consistent with previous studies,20,25 the majority of the included trials assumed a significance level (alpha error) of 0.05 and a power of 80% for the sample size assumptions. Thirty-one CRTs (31/100; 31.0%) reported the use of effect size rather than the mean or risk difference based on previous studies; however, a larger effect size may result in a smaller required sample size.26 The effect size used in these trials was considered to be large in some RCTs (the maximum value was 0.8), thus targeting a small sample size. Upon planning and designing a study, practices such as those referred to as “sample size samba”, which involve incremental retrofitting of the effect size to achieve more easily acquired and convenient sample sizes, have been heavily criticized and linked to flawed approaches and malpractice in research conduct.24
A potential limitation of this study was that the relevant records were retrieved from a single database; thus, some studies might have been missed. Nevertheless, all the targeted orthodontic journals are indexed in MEDLINE, and the timeframe assessed was large, including the last 7 years of publication records. Moreover, the reporting of the cluster design is still lacking, thus making the search within journals and other databases challenging. However, a clear picture of non-optimal sample size calculations in CRTs in orthodontics has emerged through both the main and sensitivity analyses conducted in the present report. Notwithstanding, the aim of this assessment was to shed light on and trigger awareness of the problem, rather than provide an exact estimate of sample size miscalculation in orthodontic CRTs. The study design and its variants, statistical power, ICC, and variability between and within clusters play a vital role in adjusting the sample sizes in CRTs.
We documented empirical evidence that sample size calculations in cluster randomized orthodontic trials are suboptimal. A greater understanding of cluster design and all the parameters required to undertake the correct sample size calculation is of paramount importance. The CONSORT statement extension for cluster design should be more closely adhered to by authors and journal editors when such studies are submitted for publication to support credible findings and appropriate inferences disseminated to the scientific community.
Conceptualization: SM, DK, MA. Data curation: All authors. Formal analysis: SM, DK. Investigation: HK, MA. Methodology: All authors. Project administration: DK, SM. Resources: SM. Software: SM, DK. Supervision: DK. Validation: MA, HK. Visualization: SM. Writing–original draft: SM, HK. Writing–review & editing: SM,DK, MA.
No potential conflict of interest relevant to this article was reported.
None to declare.
Korean J Orthod 2024; 54(6): 374-391 https://doi.org/10.4041/kjod24.051
First Published Date July 26, 2024, Publication Date November 25, 2024
Copyright © The Korean Association of Orthodontists.
Samer Mheissena , Haris Khanb , Mays Aldandanc , Despina Koletsid,e
aPrivate Practice, Damascus, Syria
bCMH Institute of Dentistry Lahore, National University of Medical Sciences, Lahore, Pakistan
cPrivate Practice, Daraa, Syria
dClinic of Orthodontics and Pediatric Dentistry, Center of Dental Medicine, University of Zurich, Zurich, Switzerland
eMeta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA, USA
Correspondence to:Samer Mheissen.
Specialist Orthodontist, Private Practice, Damascus 00963, Syria.
Tel +963-15833179 e-mail Mheissen@yahoo.com
How to cite this article: Mheissen S, Khan H, Aldandan M, Koletsi D. Unaccounted clustering assumptions still compromise inferences in cluster randomized trials in orthodontic research. Korean J Orthod 2024;54(6):374-391. https://doi.org/10.4041/kjod24.051
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Objective: This meta-epidemiological study aimed to determine whether optimal sample size calculation was applied in orthodontic cluster randomized trials (CRTs). Methods: Orthodontic randomized clinical trials with a cluster design, published between January 1, 2017 to December 31, 2023, in leading orthodontic journals were sourced. Study selection was undertaken by two independent authors. The study characteristics and variables required for sample size calculation were also extracted by the authors. The design effect for each trial was calculated using an intra-cluster correlation coefficient of 0.1 and the number of teeth in each cluster to recalculate the sample size. Descriptive statistics for the study characteristics, summary values for the design effect, and sample sizes were provided. Results: One-hundred and five CRTs were deemed eligible for inclusion. Of these, 100 reported sample size calculation. Nine CRTs (9.0%) did not report any effect measures for the sample size calculation, and a few did not report any power assumptions or significance levels or thresholds. Regarding the specific variables for the cluster design, only one CRT reported a design effect and adjusted the sample size accordingly. Recalculations indicated that the sample size of orthodontic CRTs should be increased by a median of 50% to maintain the same statistical power and significance level. Conclusions: Sample size calculations in orthodontic cluster trials were suboptimal. Greater awareness of the cluster design and variables is required to calculate the sample size adequately, to reduce the practice of underpowered studies.
Keywords: Cluster, Trials, Orthodontic, Cluster randomized trials
Randomized controlled trials (RCTs) are considered the cornerstone of evidence-based practice, serving as the gold standard for evaluating the effectiveness and/or safety of an intervention. A key feature of RCTs is the presence of an untreated control group followed up in parallel with the intervention group. In a simple parallel-arm design, the randomization is implemented at the participant level; the number of analyzed units equals that of the randomized units.1 However, variations in design may lead to differences between analyzed and randomized units.2 For example, researchers may randomize a group of individuals rather than one individual to receive an intervention.2-4 These groups, known as clusters, can include families, schools, villages, or dental practices. Cluster design has gained substantial interest in orthodontics and dentistry, as roughly one-quarter of published orthodontic trials5 and dental trials6 structured as cluster designs, where a group of teeth from each participant receives the same intervention as a unit of the cluster.
Sample size calculation is a fundamental step in RCTs to determine the appropriate number of patients during the design stage of a clinical trial. This calculation helps substantiate the importance, significance, and clinical relevance of the identified treatment effect. Large RCTs might unnecessarily expose patients to potentially ineffective or harmful treatments, which may be unethical or resource consuming.7 Conversely, small RCTs may lack sufficient statistical power to detect clinically meaningful differences between interventions.7
Reporting the sample size calculation is required at early stage of the study protocol to support transparency, credibility, and reproducibility of research findings. Key components for calculating sample size include the type I error (typically set at 0.05, or sometimes 0.01), power (usually 80–90%), and the assumptions of the expected difference in estimates for both the control and treatment groups, along with relevant effect sizes that justify a clinically meaningful difference.
Variations in trial design may require specific sample size calculation considerations and assumptions.8 For instance, in cluster randomized trials (CRTs), observations are correlated, whereas standard parallel-arm RCTs assume these observations are independent. In CRTs, the correlation is generally determined by two parameters: the intra-cluster correlation coefficient (ICC; ρ) and the between-cluster coefficient of variation (k). Consequently, each individual in the cluster contributes less than one independent individual, resulting in less unique information per participant and, therefore, reduced power.9,10 Thus, the sample size calculation for CRTs must be adjusted for clustering by increasing the sample size using the design effect.
D = 1 + (m–1)ρ
Where “D” is the design effect, “m” is the number of individuals per cluster, and ρ is the ICC. In orthodontics, if multiple teeth receive the same intervention and contribute to the outcome, “m” would equal the number of teeth involved from each participant.
For example, consider a trial assessing the white spot lesions formation using two different bracket systems, A and B. To detect a meaningful difference between the two groups based on 80% power and 5% type I error, 200 teeth (10 patients with 20 teeth per patient) are required in each group, while assuming independence between teeth. However, if we consider the number of teeth per patient (m = 20) and assume the ICC (ρ) of 0.1, the design effect would be D = 1 + (20–1) × 0.1 = 2.9, and the required number would be increased to 580 teeth per group, approximately 29 patients per group. However, assuming the ICC (ρ) is 0.2, the design effect would be D = 1 + (20–1) × 0.2 = 4.8, increasing the required number to 960 teeth per group, approximately 48 patients per group.
Previous studies assessed the adequacy of sample size calculation and found that the sufficiency and correctness of these calculations ranged from 7.3% to 35.6% in dental research11 and 29.5% in orthodontic trials.12 Regarding the variations in trial designs, a previous assessment8 investigated the sample size calculation in longitudinal trials and concluded that most calculations were suboptimal. However, to date, no study has yet assessed the correctness of sample size calculations and their requirements in CRTs specifically. Therefore, the current study aimed to assess the correctness of sample size calculation in orthodontic CRTs and provide a range of miscalculation amount by estimating the expected increase in sample size using the design effect.
Studies were included if they met the following criteria: (1) RCTs of cluster design, multiple teeth or mini-implants within the same patient received an intervention and contributed to outcome measures. (2) Published between January 1, 2017 to December 31, 2023. (3) Published in one of the following six major orthodontic journals (2023): European Journal of Orthodontics, the Angle Orthodontist, American Journal of Orthodontics and Dentofacial Orthopedics, Progress in Orthodontics, Orthodontics & Craniofacial Research, and the Korean Journal of Orthodontics.
Animal and preclinical studies were excluded. Studies with no clear details regarding cluster design and studies with designs other than clinical trials were also excluded.
An electronic search of MEDLINE via the PubMed database was undertaken by one author (SM), with the latest update on February 7, 2024, using text words and medical subject headings (Appendix 1). Records irrelevant to the eligible journals were removed, and two authors (SM, HK) performed the initial screening of the studies independently and in duplicate. Trials with interventions involving more than one tooth/mini-implant per patient were included in the full-text review. The same two investigators scrutinized the full texts of potentially eligible articles and evaluated them against the inclusion criteria. In the presence of any disagreement, a consensus was reached after discussion between the two authors.
Two authors independently extracted the following study characteristics: number of authors, continent of the first author (Europe, Americas, or Asia and others), journal and year of publication, study design (parallel, split-mouth, or crossover), and number of arms. The variables required for the sample size calculation were extracted by a single author (SM) after calibration with another author (MA) and entered into an Excel file (Microsoft, Redmond, WA, USA) equipped with the equation to calculate the design effect for each study based on ICC (ρ = 0.1). This value was lower than the reported value (0.2) in previous orthodontic13 and dental studies,14,15 and was selected as a more conservative approach because of the lack of a common ICC for different orthodontic outcomes. The value of “m” was calculated for each study based on the number of teeth or min-implants contributing to each patient unit. Finally, the required number of patients for CRTs was recalculated by multiplying the design effect with the number calculated by the authors of the original CRT publications, as described previously. For each CRT, the increase in the sample size was divided by the number calculated by the authors of the original publication to provide the percentage of the required increase in the number of participants to maintain the same statistical power.
Descriptive statistics were provided for the included studies using the median and interquartile range (IQR). The associations between calculating the sample size in CRTs (using the appropriate/optimal approach) and study characteristics were planned to be examined using statistical testing. However, this was not feasible, as only one trial reported the design effect and performed an optimal calculation of the CRT sample size. Five CRTs were excluded from reporting and recalculating the sample size due to the lack of details regarding the sample size. Sensitivity analysis was conducted to isolate the effect of the simple parallel design on the recalculation of the sample size. All statistical analyses were conducted using Stata 15.1 (Stata Corp., College Station, TX, USA) and R statistical package (version 4.3.0; R Foundation for Statistical Computing, Vienna, Austria).
Following the inclusion of the aforementioned journals, 323 articles were screened. One hundred and fifty-one articles were excluded after reading the title and abstract, and 67 articles were excluded after full-text reading for various reasons (Appendix 2). One hundred and five CRTs were eligible for inclusion and data extraction (Figure 1).
Within this cohort, 100 CRTs (95.0%) reported a sample size calculation, with a median of four participating authors (IQR: 3–6), mostly originating from Europe (48/105; 45.7%). Most CRTs were single-center trials (99/105; 94.3%) with a parallel design (76/105; 72.4%) and two arms (84/105; 80.0%). More than half (58/105; 55.2%) had prior protocol registration, while approximately one-third (33/105; 31.4%) lacked optimal reporting of protocol registration (Table 1).
Table 1 . Characteristics of included cluster randomized trials according to whether sample size calculations were reported.
Characteristic | Overall (n = 105) | No (n = 5) | Yes (n = 100) |
---|---|---|---|
Authors’ number | 4 (3, 6) | 4 (2, 5) | 4 (3, 6) |
Continent | |||
Americas | 18 (17.1) | 0 (0) | 18 (18.0) |
Asia/others | 39 (37.1) | 2 (40.0) | 37 (37.0) |
Europe | 48 (45.7) | 3 (60.0) | 45 (45.0) |
Journal/book | |||
AJODO | 31 (29.5) | 2 (40.0) | 29 (29.0) |
AO | 35 (33.3) | 1 (20.0) | 34 (34.0) |
EJO | 27 (25.7) | 2 (40.0) | 25 (25.0) |
KJO | 5 (4.8) | 0 (0) | 5 (5.0) |
OCR | 1 (1.0) | 0 (0) | 1 (1.0) |
PIO | 6 (5.7) | 0 (0) | 6 (6.0) |
Publication year | |||
2017 | 8 (7.6) | 2 (40.0) | 6 (6.0) |
2018 | 18 (17.1) | 3 (60.0) | 15 (15.0) |
2019 | 13 (12.4) | 0 (0) | 13 (13.0) |
2020 | 15 (14.3) | 0 (0) | 15 (15.0) |
2021 | 20 (19.0) | 0 (0) | 20 (20.0) |
2022 | 14 (13.3) | 0 (0) | 14 (14.0) |
2023 | 17 (16.2) | 0 (0) | 17 (17.0) |
Centers | |||
Multi | 6 (5.7) | 1 (20.0) | 5 (5.0) |
Single | 99 (94.3) | 4 (80.0) | 95 (95.0) |
Number of arms | |||
2 | 84 (80.0) | 3 (60.0) | 81 (81.0) |
3 | 15 (14.3) | 2 (40.0) | 13 (13.0) |
4 | 6 (5.7) | 0 (0) | 6 (6.0) |
Design | |||
Crossover | 2 (1.9) | 0 (0) | 2 (2.0) |
Parallel | 76 (72.4) | 4 (80.0) | 72 (72.0) |
Split mouth | 27 (25.7) | 1 (20.0) | 26 (26.0) |
Protocol registration | |||
Yes | 58 (55.2) | 1 (20.0) | 57 (57.0) |
No | 14 (13.3) | 0 (0) | 14 (14.0) |
Not reported | 33 (31.4) | 4 (80.0) | 29 (29.0) |
Values are presented as median (interquartile range) or number (%)..
AJODO, American Journal of Orthodontics and Dentofacial Orthopedics; AO, The Angle Orthodontist; EJO, European Journal of Orthodontics; KJO, Korean Journal of Orthodontics; OCR, Orthodontics & Craniofacial Research; PIO, Progress in Orthodontics..
Of the included CRTs that reported the sample size calculation, 31 of 100 (31.0%) based the calculation on effect size, 44 of 100 (44.0%) reported the mean difference, and 9.0% did not report any effect measure. More than half of the included CRTs opted for 80% power to calculate the sample size, whereas a few CRTs did not report the power assumptions at all (2.0%). The vast majority of the included CRTs used the value 0.05 for alpha (type I error) to estimate the sample size, while a few CRTs (8.0%) did not report a significance level (Table 2). Only one included CRT13 reported the design effect and adjusted the sample size accordingly.
Table 2 . Reporting of sample size calculation in cluster randomized trials when it was feasible.
Item | n = 100 |
---|---|
Effect measure | |
Effect size | 31 (31.0) |
Mean difference | 44 (44.0) |
Relative risk reduction | 4 (4.0) |
Risk difference | 12 (12.0) |
ni | 9 (9.0) |
Value of the effect measure | |
Effect size | 0.50 (0.43, 0.80) |
Mean difference | 1.04 (0.50, 2.00) |
Relative risk reduction | 0.15 (0.08, 0.20) |
Risk difference | 0.25 (0.20, 0.66) |
Level of significance (α) | |
0.001 | 1 (1.0) |
0.01 | 3 (3.0) |
0.0125 | 1 (1.0) |
0.025 | 1 (1.0) |
0.05 | 86 (86.0) |
Not reported | 8 (8.0) |
Power | |
80% | 60 (60.0) |
81–85% | 11 (11.0) |
90% | 19 (19.0) |
> 90% | 8 (8.0) |
Not reported | 2 (2.0) |
Accounting for cluster effect | |
Yes | 1 (1.0) |
No | 99 (99.0) |
ICC | |
None | 100 (100.0) |
Values are presented as number (%) or median (interquartile range)..
ICC, intra-cluster correlation coefficient; ni, no information..
Table 3 lists the parameters used to recalculate the sample size. The median number of participants was 67.6 after recalculation, which was greater than the median number of participants provided by the included papers (40 participants). This can be interpreted as follows: the median increase in the sample size was 50% (IQR: 30%, 90%) based on the number of teeth in each cluster when the value of 0.1 was used as the ICC, maintaining the same power and level of statistical significance (Figure 2). A sensitivity analysis based solely on 72 parallel studies yielded similar results, with a median increase in sample size of 50% (IQR: 30%, 120%).
Table 3 . Recalculation of sample size and sensitivity analysis for CRT with parallel design.
Re-calculation (100 CRTs) | Sensitivity analysis (72 CRTs) | |
---|---|---|
Design effect | 1.5 (1.3, 1.9) | 1.5 (1.3, 2.2) |
Number of individuals per cluster | 6 (4, 10) | 6 (4, 13) |
Number of clusters | 18.5 (12.5, 27.0) | 18.0 (14.0, 24.5) |
Sample size in the paper | 40 (26.5, 59.0) | 40 (30.0, 57.5) |
Number of required participants | 67.6 (36.2, 108.0) | 68.5 (36.9, 114.0) |
Percentage | 50% (30%, 90%) | 50% (30%, 120%) |
Values are presented as median (interquartile range)..
CRTs, cluster randomized trials..
The present study confirmed a miscalculation of the expected sample size in orthodontic CRTs published in the last 7 years, with more than 50 percent underestimation of the actual sample size requirements being a typical flaw.
Cluster design is frequently encountered in orthodontic and dental RCTs5,6 due to the fact that several teeth from the same individual are allocated to an intervention in the trial and constitute subunits of the patient-cluster. Consequently, the unique information obtained from cluster data is less than that obtained from independent data, thus requiring a mandatory increase in sample size to compensate for the clustering effect in CRTs.16 The design effect, which is a typical correction factor for the required adjustment in sample size calculations in CRTs, was rarely reported in the present sample. This raises concerns about whether cluster design is actually being employed in orthodontics, and reflects a lack of awareness of potential clustering effects in orthodontic RCTs,5 starting from sample size assumptions. Ignoring the data structure and the correlation arising from multiple measurements was also evident in longitudinal and repeated-measure design in orthodontics.8 Of the 147 included trials, no single study reported an optimal calculation. A recent empirical report that examined clustering effects across all types of studies published in three orthodontic journals over a 3-year period reported that only one-fifth to one-fourth of published research of any kind accounted for clustering effects in sample size calculations. However, no attempt at recalculation was made, nor were CRTs explicitly assessed; thus, further direct comparisons with this report cannot be made.17 In contrast, a previous healthcare report found that elements specific to CRTs were the worst reported when calculating the sample size, whereas only 22% reported all recommended elements.18 Similarly, in dentistry and orthodontics, it is still difficult to handle participants as clusters in specific cases, or there is a lack of understanding of the theoretical and scientific background of the different structures of study designs.
Accurate and transparent reporting of sample size calculation is essential for RCTs according to the Consolidated Standards of Reporting Trials (CONSORT) group.19 One might argue that a significant improvement could be confirmed in this assessment compared with a study undertaken 10 years ago regarding sample size calculation in orthodontic RCTs.20 This early study found a lack of complete reporting of the sample size components in 70% of the included RCTs, while this was less than 10% in our assessment of CRTs. This should be interpreted with caution owing to the inclusion of only one specific design in the present study. However, cluster trial reporting requires more details and information related to the number of clusters, the cluster size (usually the number of teeth in orthodontics), and the ICC according to the CONSORT extension for cluster design.21 A previous study22 found that journals promoting CONSORT adherence are associated with superior reporting of RCTs. However, a survey23 found that only 12 of 165 high-impact journals mentioned the extension to cluster trials in their online instructions for authors. Thus, more rigorous editorial policies regarding CONSORT extensions are required to bring substantial improvement to CRT reporting.
It is worth mentioning that a higher ICC value or number of teeth per cluster (m), requires an increase in the sample size to maintain the same power of the study. Failing to increase the sample size may lead to an underpowered trial, as increasing the power from 50% to 80% would require a two-fold increase in the trial size.24 The present study found that the number of participants in orthodontic CRTs should be increased by a median of 50% to maintain the same statistical power. This was also confirmed when we focused on the simplest design of the randomized trials assessed, the parallel-arm design, to avoid any effects from the more complex structures encountered, which would potentially involve the evaluation of additional parameters, further implicating between cluster variability issues. Consistent with previous studies,20,25 the majority of the included trials assumed a significance level (alpha error) of 0.05 and a power of 80% for the sample size assumptions. Thirty-one CRTs (31/100; 31.0%) reported the use of effect size rather than the mean or risk difference based on previous studies; however, a larger effect size may result in a smaller required sample size.26 The effect size used in these trials was considered to be large in some RCTs (the maximum value was 0.8), thus targeting a small sample size. Upon planning and designing a study, practices such as those referred to as “sample size samba”, which involve incremental retrofitting of the effect size to achieve more easily acquired and convenient sample sizes, have been heavily criticized and linked to flawed approaches and malpractice in research conduct.24
A potential limitation of this study was that the relevant records were retrieved from a single database; thus, some studies might have been missed. Nevertheless, all the targeted orthodontic journals are indexed in MEDLINE, and the timeframe assessed was large, including the last 7 years of publication records. Moreover, the reporting of the cluster design is still lacking, thus making the search within journals and other databases challenging. However, a clear picture of non-optimal sample size calculations in CRTs in orthodontics has emerged through both the main and sensitivity analyses conducted in the present report. Notwithstanding, the aim of this assessment was to shed light on and trigger awareness of the problem, rather than provide an exact estimate of sample size miscalculation in orthodontic CRTs. The study design and its variants, statistical power, ICC, and variability between and within clusters play a vital role in adjusting the sample sizes in CRTs.
We documented empirical evidence that sample size calculations in cluster randomized orthodontic trials are suboptimal. A greater understanding of cluster design and all the parameters required to undertake the correct sample size calculation is of paramount importance. The CONSORT statement extension for cluster design should be more closely adhered to by authors and journal editors when such studies are submitted for publication to support credible findings and appropriate inferences disseminated to the scientific community.
Conceptualization: SM, DK, MA. Data curation: All authors. Formal analysis: SM, DK. Investigation: HK, MA. Methodology: All authors. Project administration: DK, SM. Resources: SM. Software: SM, DK. Supervision: DK. Validation: MA, HK. Visualization: SM. Writing–original draft: SM, HK. Writing–review & editing: SM,DK, MA.
No potential conflict of interest relevant to this article was reported.
None to declare.
Table 1 . Characteristics of included cluster randomized trials according to whether sample size calculations were reported.
Characteristic | Overall (n = 105) | No (n = 5) | Yes (n = 100) |
---|---|---|---|
Authors’ number | 4 (3, 6) | 4 (2, 5) | 4 (3, 6) |
Continent | |||
Americas | 18 (17.1) | 0 (0) | 18 (18.0) |
Asia/others | 39 (37.1) | 2 (40.0) | 37 (37.0) |
Europe | 48 (45.7) | 3 (60.0) | 45 (45.0) |
Journal/book | |||
AJODO | 31 (29.5) | 2 (40.0) | 29 (29.0) |
AO | 35 (33.3) | 1 (20.0) | 34 (34.0) |
EJO | 27 (25.7) | 2 (40.0) | 25 (25.0) |
KJO | 5 (4.8) | 0 (0) | 5 (5.0) |
OCR | 1 (1.0) | 0 (0) | 1 (1.0) |
PIO | 6 (5.7) | 0 (0) | 6 (6.0) |
Publication year | |||
2017 | 8 (7.6) | 2 (40.0) | 6 (6.0) |
2018 | 18 (17.1) | 3 (60.0) | 15 (15.0) |
2019 | 13 (12.4) | 0 (0) | 13 (13.0) |
2020 | 15 (14.3) | 0 (0) | 15 (15.0) |
2021 | 20 (19.0) | 0 (0) | 20 (20.0) |
2022 | 14 (13.3) | 0 (0) | 14 (14.0) |
2023 | 17 (16.2) | 0 (0) | 17 (17.0) |
Centers | |||
Multi | 6 (5.7) | 1 (20.0) | 5 (5.0) |
Single | 99 (94.3) | 4 (80.0) | 95 (95.0) |
Number of arms | |||
2 | 84 (80.0) | 3 (60.0) | 81 (81.0) |
3 | 15 (14.3) | 2 (40.0) | 13 (13.0) |
4 | 6 (5.7) | 0 (0) | 6 (6.0) |
Design | |||
Crossover | 2 (1.9) | 0 (0) | 2 (2.0) |
Parallel | 76 (72.4) | 4 (80.0) | 72 (72.0) |
Split mouth | 27 (25.7) | 1 (20.0) | 26 (26.0) |
Protocol registration | |||
Yes | 58 (55.2) | 1 (20.0) | 57 (57.0) |
No | 14 (13.3) | 0 (0) | 14 (14.0) |
Not reported | 33 (31.4) | 4 (80.0) | 29 (29.0) |
Values are presented as median (interquartile range) or number (%)..
AJODO, American Journal of Orthodontics and Dentofacial Orthopedics; AO, The Angle Orthodontist; EJO, European Journal of Orthodontics; KJO, Korean Journal of Orthodontics; OCR, Orthodontics & Craniofacial Research; PIO, Progress in Orthodontics..
Table 2 . Reporting of sample size calculation in cluster randomized trials when it was feasible.
Item | n = 100 |
---|---|
Effect measure | |
Effect size | 31 (31.0) |
Mean difference | 44 (44.0) |
Relative risk reduction | 4 (4.0) |
Risk difference | 12 (12.0) |
ni | 9 (9.0) |
Value of the effect measure | |
Effect size | 0.50 (0.43, 0.80) |
Mean difference | 1.04 (0.50, 2.00) |
Relative risk reduction | 0.15 (0.08, 0.20) |
Risk difference | 0.25 (0.20, 0.66) |
Level of significance (α) | |
0.001 | 1 (1.0) |
0.01 | 3 (3.0) |
0.0125 | 1 (1.0) |
0.025 | 1 (1.0) |
0.05 | 86 (86.0) |
Not reported | 8 (8.0) |
Power | |
80% | 60 (60.0) |
81–85% | 11 (11.0) |
90% | 19 (19.0) |
> 90% | 8 (8.0) |
Not reported | 2 (2.0) |
Accounting for cluster effect | |
Yes | 1 (1.0) |
No | 99 (99.0) |
ICC | |
None | 100 (100.0) |
Values are presented as number (%) or median (interquartile range)..
ICC, intra-cluster correlation coefficient; ni, no information..
Table 3 . Recalculation of sample size and sensitivity analysis for CRT with parallel design.
Re-calculation (100 CRTs) | Sensitivity analysis (72 CRTs) | |
---|---|---|
Design effect | 1.5 (1.3, 1.9) | 1.5 (1.3, 2.2) |
Number of individuals per cluster | 6 (4, 10) | 6 (4, 13) |
Number of clusters | 18.5 (12.5, 27.0) | 18.0 (14.0, 24.5) |
Sample size in the paper | 40 (26.5, 59.0) | 40 (30.0, 57.5) |
Number of required participants | 67.6 (36.2, 108.0) | 68.5 (36.9, 114.0) |
Percentage | 50% (30%, 90%) | 50% (30%, 120%) |
Values are presented as median (interquartile range)..
CRTs, cluster randomized trials..