Simulated Type I Error Rates for Unbalanced Cluster Samples

We performed a simulation study to compare the Type I error performance of ten analytic methods for cluster randomized designs. The analytic methods were applied to a cluster randomized design with two treatment groups and one level of clustering, under several scenarios of cluster size imbalance. The lookup table below allows scientists to investigate the Type I error performance of the analytic methods in scenarios which best match their intended design. It is most useful for researchers who know the anticipated cluster sizes and correlation for their planned study.

Lookup Table of Type I Error Values

The table below shows all scenarios from our simulation study of Type I error rates in unbalanced cluster samples. To examine Type I error rates for your specific study design, filter the table based on the parameters of imbalance listed above the table.

For example, if you anticipate low intracluster correlation in your study, select an intracluster correlation of 0.001 from the dropdown list. If you also expect about 30 participants per cluster, then you would select values for nbar1 and nbar2 closest to 30. From the values available in the simulation study, a cluster size of 32 would be the best choice. You may filter by additional parameters as needed to match your planned study.

Once you have filtered the results, examine the Type I error rates in the table for each method. Methods which have Type I error rates closest to 0.05 will provide the best Type I error control for your design.

Column Filter
Intracluster correlation (rho)
Number of clusters in treatment group 1 (m1) From to
Number of clusters in treatment group 2 (m2) From to
Average cluster size in treatment group 1 (nbar1) From to
Average cluster size in treatment group 2 (nbar2) From to
The ratio of maximum to minimum cluster size in treatment group 1 (r1) From to
The ratio of maximum to minimum cluster size in treatment group 2 (r2) From to

Summary of Statistical Methods

The table below summarizes each of the ten statistical methods tested. One-stage models are linear mixed models, which account for within-cluster correlation using a random intercept. Two-stage models first calculate the average outcome within each cluster, and then analyze the resulting cluster means using a general linear univariate model.

Method Model Details
1 One-stage Mixed model with Kenwood-Roger denominator degrees of freedom, variance constrained positive
2 One-stage Mixed model with Kenwood-Roger denominator degrees of freedom, unconstrained variance
3 One-stage Mixed model denominator degrees of freedom m-g , variance constrained positive
4 One-stage Mixed model with denominator degrees of freedom m-g , unconstrained variance
5 Two-stage General linear model with weight matrix W= I m
6 Two-stage General linear model with weight matrix W= diag ( h=1 g i=1 m h n hi )
7 Two-stage General linear model with weight matrix W= diag ( h=1 g i=1 m h 1/ n hi )
8 Two-stage General linear model with weight matrix W= diag ( h=1 g i=1 m h [ n hi ( n hi -1 ) ]/[ y hi ' y hi - n hi y 1,hi ] )
9 Two-stage General linear model with weight matrix W= diag [ h=1 g i=1 m h ( σ ^ c 2 + σ ^ e 2 / n hi ) -1 ] , and variance constrained positive
10 Two-stage General linear model with weight matrix W= diag [ h=1 g i=1 m h ( σ ^ c 2 + σ ^ e 2 / n hi ) -1 ] , and unconstrained variance