OUR DESIGN PARAMETERS & EFFECT SIZE BENCHMARKS

Without good estimates, power analysis is only guesswork.

— David M. Murray (1998, p. 350)

The virtue of a power analysis stands or falls with the quality of its input—true to the motto, “garbage in, garbage out.” Reasonable assumptions on design parameters and effect sizes are therefore key to planning strong experimental studies. Design parameters reflect the (clustered) variance structure of the interventions’ target outcome. They include intraclass correlation coefficients (ICCs) ρ that quantify outcome differences between clusters and R² values that quantify the proportions of explained variance by covariates (at various levels).

Basically, the estimates entered into power computations should match the features of the planned RT: its target population, the outcome of interest, the hierarchical structure and randomization scheme, the ensuing analysis, and any applied covariates (e.g., Bloom et al., 2007; Campbell et al., 2000; Cohen, 1988; Hedges & Hedberg, 2007; Lipsey et al., 2012; Murray, 1998; Schochet, 2008; Spybrook et al., 2016; Zhang et al., 2023). However, since the true quantities are unknown a priori by definition, the assumed values for design parameters and effect sizes are inevitably subject to uncertainty, no matter how informed and (empirically and theoretically) justified a researcher’s assumptions are (this is also referred to as the local optimization problem, Du & Wang (2016); Moerbeek & Teerenstra (2016)].

To optimally support evaluation researchers in designing strong RTs that allow valid causal inferences about educational and psychological interventions, we have compiled extensive databases and resources of versatile single- and multilevel design parameters and effect size benchmarks (and corresponding standard errors), along with concise application and selection guidance.

Note

There are several resources for empirical estimates of ρ and R² for educational and psychological outcomes. An overview of existing international and German research on design parameters for student achievement can be found in Stallasch (2024), and – in more detail/meta-analytically aggregated – in Stallasch et al. (2021)/Stallasch et al. (2024). The research review in Brunner et al. (2025) also addresses socio-emotional learning outcomes. Similarly, Brunner et al. (2023) summarize available effect size benchmarks for student achievement.

Design Parameters for the German School Context

Stallasch et al. (2021)

Our comprehensive set of multilevel design parameters for student achievement in elementary and secondary school

In Stallasch et al. (2021), we generated ρ and R² values with corresponding standard errors for student achievement as the intervention’s target outcome. We specified two-level (students within schools) and three-level (students within classrooms within schools) latent covariate models to analyze design parameters for a broad array of mathematical-scientific, verbal, and domain-general achievement outcomes. The estimates apply to several (sub)populations across the entire school career from Grade 1 to 12. We cover the three most important covariate sets: pretests, sociodemographic characteristics, and the combination thereof. We utilized data from five national probability samples of three longitudinal German large-scale assessments (DESI, NEPS, PISA-I+ 2003), and additionally summarized the estimates in terms of normative distributions.

The design parameters are presented in an interactive Excel file. This Supplemental Online Material B comes with a detailed introduction on the application scopes for the various sets of estimates. The document can be downloaded from the study’s OSF repo or the Journal’s website.

Figure 1: Previews of the design parameters from Stallasch et al. (2021)

Full collection

Stallasch et al. (2024)

Our (meta-analytic) single- and multilevel design parameters and selection guidelines for various covariate types, combinations, and time lags

In Stallasch et al. (2024), we specifically focused on covariates in randomized trials on student achievement. We accumulated a vast collection of R² values with corresponding standard errors. Specifically, we referred to influential psychometric heuristics to analyze design parameters and develop selection guidelines for covariate types of varying bandwidth-fidelity, covariate combinations to quantify their incremental validities, and covariate time lags of 1 to 7 years to examine their validity degradation. We applied (manifest) single-level (students assumed to be independent), two-level (students within schools), and three-level (students within classrooms within schools) modeling and considered various mathematical-scientific and verbal achievement outcomes. Data stemmed from six national probability samples of three longitudinal German large-scale assessments (DESI, NEPS, PISA-I+ 2003, PISA-I+ 2012). The empirically estimated design parameters were meta-analytically integrated and employed in precision simulations.

The design parameters (and simulation results) are listed in interactive Excel files. The empirical estimates can be found in Online Supplemental Material E, and the meta-analytic estimates in Online Supplemental Material F (and the simulation results in Online Supplemental Material G). These resources can be downloaded from the study’s OSF repo.

Figure 2: Previews of the empirically estimated design parameters from Stallasch et al. (2024)

Figure 3: Previews of the meta-analytically integrated design parameters from Stallasch et al. (2024)

Full collection

Brunner et al. (2024)

Our (meta-analytic) design parameters for cognitive and socio-emotional learning outcomes in preschool

In Brunner et al. (2025), we estimated and meta-analyzed ρ and R² values for preschool children’s cognitive as well as socio-emotional learning outcomes. For this purpose, we drew on a systematic collection of individual participant data from 4 German probability samples of 2- to 6-year-olds. The estimates are relevant for planning single-level (e.g., in non-clustered lab-based settings), two-level (children nested within daycare centers), and three-level (children nested within groups within daycare centers) RTs. The analyzed outcomes are assessed with three methods (standardized tests, parent ratings, and educator ratings). We also considered vital covariate sets (baseline measures, sociodemographic characteristics, and their combination).

Both the empirically estimated and meta-analytically integrated design parameters are compiled in an Excel file featuring interactive tables with various filtering and selection functionalities. This allows users to easily access the design parameters that align with specific characteristics of the target intervention. The respective Online Supplemental Material B can be downloaded from the study’s OSF repo.

Figure 4: Previews of the design parameters from Brunner et al. (2025)

Full collection

Effect Size Benchmarks for the German School Context

Brunner et al. (2023)

Our novel (meta-analytic) effect sizes to interpret results from intervention studies on student achievement across the entire school career

In Brunner et al. (2023), we gathered a vast collection of meta-analytically aggregated effect size benchmarks for various achievement outcomes of 1st to 12th graders–a resource that is especially valuable as it is the first of its kind in the German school system. Specifically, we generated estimates of normative expectations for students’ academic growth as well as of performance gaps between weak and average schools or between policy-relevant, demographically defined groups. We used longitudinal data from six large probability samples of four German large-scale assessments (NEPS, PISA-I+ 2003, PISA-I+ 2012, and DESI).

The effect size benchmarks are listed in Tables 1 to 5 of the article, available on the Journal’s website.

Figure 5: Previews of the effect size benchmarks from Brunner et al. (2023)

Full collection

Further effect size benchmarks (e.g., from previous research, for specific school types and federal states) are documented in various Excel files. The respective Online Supplemental Materials B, C, and D can be downloaded from the study’s OSF repo or from the Journal’s website.

Figure 6: Previews of further effect size benchmarks from Brunner et al. (2023)

Full collection

International Design Parameters

Brunner et al. (2018)

Our world-wide diversified multilevel design parameters for student achievement, affect and motivation, and learning strategies

In their cross-country research, Brunner et al. (2018) estimated a broad spectrum of two-level (students within schools) ρ and R² values for 15-year-olds’ domain-specific and domain-general academic outcomes. The authors capitalized on representative, cross-sectional large-scale assessment data for 81 nations and economies from five PISA cycles in the years 2000, 2003, 2006, 2009, and 2012. Sociodemographic characteristics were employed as covariates. The estimates are also summarized in terms of normative distributions.

The design parameters are provided as an extensive Excel file, which can be downloaded from the Journal’s website.

Figure 7: Previews of the design parameters from Brunner et al. (2018)

Full collection

References

Bloom, H. S., Richburg-Hayes, L., & Black, A. R. (2007). Using covariates to improve precision for studies that randomize schools to evaluate educational interventions. Educational Evaluation and Policy Analysis, 29(1), 30–59. https://doi.org/10.3102/0162373707299550

Brunner, M., Keller, U., Wenger, M., Fischbach, A., & Lüdtke, O. (2018). Between-school variation in students’ achievement, motivation, affect, and learning strategies: Results from 81 countries for planning group-randomized trials in education. Journal of Research on Educational Effectiveness, 11(3), 452–478. https://doi.org/10.1080/19345747.2017.1375584

Brunner, M., Stallasch, S. E., Artelt, C., & Lüdtke, O. (2025). An individual participant data meta-analysis to support power analyses for randomized intervention studies in preschool: Cognitive and socio-emotional learning outcomes. Educational Psychology Review, 37(1), 6. https://doi.org/10.1007/s10648-024-09981-z

Brunner, M., Stallasch, S. E., & Lüdtke, O. (2023). Empirical benchmarks to interpret intervention effects on student achievement in elementary and secondary school: Meta-analytic results from Germany. Journal of Research on Educational Effectiveness, 17(1), 119–157. https://doi.org/10.1080/19345747.2023.2175753

Campbell, M., Grimshaw, J., Steen, N., & Changing Professional Practice in Europe Group (EU BIOMED II Concerted Action). (2000). Sample size calculations for cluster randomised trials. Journal of Health Services Research & Policy, 5(1), 12–16. https://doi.org/10.1177/135581960000500105

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). L. Erlbaum Associates.

Du, H., & Wang, L. (2016). A Bayesian power analysis procedure considering uncertainty in effect size estimates from a meta-analysis. Multivariate Behavioral Research, 51(5), 589–605. https://doi.org/10.1080/00273171.2016.1191324

Hedges, L. V., & Hedberg, E. C. (2007). Intraclass correlation values for planning group-randomized trials in education. Educational Evaluation and Policy Analysis, 29(1), 60–87. https://doi.org/10.3102/0162373707299706

Lipsey, M. W., Puzio, K., Yun, C., Hebert, M. A., Steinka-Fry, K., Cole, M. W., Roberts, M., Anthony, K. S., & Busick, M. D. (2012). Translating the statistical representation of the effects of education interventions into more readily interpretable forms. National Center for Special Education Research. http://eric.ed.gov/?id=ED537446

Moerbeek, M., & Teerenstra, S. (2016). Power analysis of trials with multilevel data. CRC Press.

Murray, D. M. (1998). Design and analysis of group-randomized trials. Oxford University Press.

Schochet, P. Z. (2008). Statistical power for random assignment evaluations of education programs. Journal of Educational and Behavioral Statistics, 33(1), 62–87. https://doi.org/10.3102/1076998607302714

Spybrook, J., Shi, R., & Kelcey, B. (2016). Progress in the past decade: An examination of the precision of cluster randomized trials funded by the U.S. Institute of Education Sciences. International Journal of Research & Method in Education, 39(3), 255–267. https://doi.org/10.1080/1743727X.2016.1150454

Stallasch, S. E. (2024). Optimizing power analysis for randomized experiments: Design parameters for student achievement (pp. ix, 224) [Doctoral thesis, Universität Potsdam]. https://doi.org/10.25932/publishup-62939

Stallasch, S. E., Lüdtke, O., Artelt, C., & Brunner, M. (2021). Multilevel design parameters to plan cluster-randomized intervention studies on student achievement in elementary and secondary school. Journal of Research on Educational Effectiveness, 14(1), 172–206. https://doi.org/10.1080/19345747.2020.1823539

Stallasch, S. E., Lüdtke, O., Artelt, C., Hedges, L. V., & Brunner, M. (2024). Single- and multilevel perspectives on covariate selection in randomized intervention studies on student achievement. Educational Psychology Review, 36(4), 112. https://doi.org/10.1007/s10648-024-09898-7

Zhang, Q., Spybrook, J., Kelcey, B., & Dong, N. (2023). Foundational methods: Power analysis. In R. J. Tierney, F. Rizvi, & K. Ercikan (Eds.), International encyclopedia of education (4th ed., pp. 784–791). Elsevier. https://doi.org/10.1016/B978-0-12-818630-5.10088-0