Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
12 changes: 9 additions & 3 deletions config/_default/menus.toml
Original file line number Diff line number Diff line change
Expand Up @@ -331,22 +331,28 @@
weight = 69
parent = "replications"

[[main]]
name = "Replication Network Blog"
url = "/replication-hub/blog"
weight = 70
parent = "replications"

[[main]]
name = "Replication Research Journal"
url = "https://replicationresearch.org"
weight = 70
weight = 71
parent = "replications"

[[main]]
name = "Replication Manuscript Templates"
url = "https://osf.io/brxtd/"
weight = 71
weight = 72
parent = "replications"

[[main]]
name = "Submit a replication to FReD"
url = "/replication-hub/submit"
weight = 72
weight = 73
parent = "replications"


Expand Down
21 changes: 21 additions & 0 deletions content/replication-hub/blog/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
title: "Replication Network Blog"
date: 2025-11-04
type: blog
url: "/replication-hub/blog/"
---

Welcome to the Replication Network Blog, a collection of guest posts, perspectives, and discussions on replication research, reproducibility, and open science practices.

Browse through our archive of articles covering topics including:
- Replication studies and methodologies
- Statistical considerations in replication research
- Peer review and publishing practices
- Meta-science and research quality
- Teaching and learning about replications

The blog features contributions from researchers, statisticians, and practitioners who share their insights and experiences with replication research across various disciplines.

---

## Recent Blog Posts
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
---
title: "ANDERSON & MAXWELL: There’s More than One Way to Conduct a Replication Study – Six, in Fact"
date: 2017-02-28
author: "The Replication Network"
tags:
- "GUEST BLOGS"
- "confidence intervals"
- "equivalence tests"
- "p-value"
- "replication"
- "significance testing"
draft: false
type: blog
---

###### *NOTE: This entry is based on the article, “There’s More Than One Way to Conduct a Replication Study: Beyond Statistical Significance” (Psychological Methods, 2016, Vol, 21, No. 1, 1-12)*

###### Following a large-scale replication project in economics (Chang & Li, 2015) that successfully replicated only a third of 67 studies, a recent headline boldly reads, “The replication crisis has engulfed economics” (Ortman, 2015). Several fields are suffering from a “crisis of confidence” (Pashler & Wagenmakers, 2012, p. 528), as widely publicized replication projects in psychology and medicine have showed similarly disappointing results (e.g., Open Science Collaboration, 2015; Prinz, Schlange, & Asadullah, 2011). There are certainly a host of factors contributing to the crisis, but there is a silver lining: the recent increase in attention toward replication has allowed researchers to consider various ways in which replication research can be improved. Our article (Anderson & Maxwell, 2016, *Psychological Methods*) sheds light on one potential way to broaden the effectiveness of replication research.

###### In our article, we take the perspective that replication has often been narrowly defined. Namely, if a replication study is statistically significant, it is considered successful, whereas if the replication study does not meet the significance threshold, it is considered a failure. However, replication need not only be defined by this significant, non-significant distinction. We posit that what constitutes a successful replication can vary based on a researcher’s specific goal. We outline six replication goals and provide details on the statistical analysis for each, noting that these goals are by no means exhaustive.

###### Deeming a replication as successful when the result is statistically significant is indeed merited in a number of situations (Goal 1). For example, consider the case where two competing theories are pitted against each other. In this situation, we argue that it is the direction of the effect that matters, which validates one theory over another, rather than the magnitude of the effect. Significance based replication can be quite informative in these cases. However, even in this situation, a nonsignificant result should not be taken to mean that the replication was a failure. Researchers who desire to evidence that a reported effect is null can consider Goal 2.

###### In Goal 2, researchers are interested in showing that an effect does not exist. Although some researchers seem to be aware that this is a valid goal, their choice of analysis often only fails to reject the null, which is rather weak evidence for nonreplication.  We encourage researchers who would like to show that a claimed effect is null to use an equivalence test or Bayesian methods (e.g., ROPE, Kruschke, 2011; Bayes-factors, Rouder & Morey, 2012), both of which can reliably show an effect is essentially zero, rather than simply that it is not statistically significant.

###### Goal 3 involves accurately estimating the magnitude of a claimed effect. Research has shown that effect sizes in published research are upwardly biased (Lane & Dunlap, 1978; Maxwell, 2004), and effect sizes from underpowered studies may have wide confidence intervals. Thus, a replication researcher may have reason to question the reported effect size of a study and desire to obtain a more accurate estimate of the effect. Researchers with this goal in mind can use accuracy in parameter estimation (AIPE; Maxwell, Kelley, & Rausch, 2008) approaches to plan their sample sizes so that a desired degree of precision in the effect size estimate can be achieved. In the analysis phase, we encourage these researchers to report a confidence interval around the replication effect size. Thus, successful replication for Goal 3 is defined by the degree of precision in estimating the effect size.

###### Goal 4 involves combining data from a replication study with a published original study, effectively conducting a small meta-analysis on the two studies. Importantly, access to the raw data from the original study is often not necessary. This approach is in keeping with the idea of continuously cumulating meta-analysis, (CCMA; Braver, Thoemmes, & Rosenthal, 2014) wherein each new replication can be incorporated into the previous knowledge. Researchers can report a confidence interval around the average (weighted) effect size of the two studies (e.g., Bonett, 2009). This goal begins to correct some of the issues associated with underpowered studies, even when only a single replication study is involved. For example, Braver and colleagues (2014) illustrate a situation in which the *p*-value combining original and replication studies (*p* = .016) was smaller than both the original study (*p* = .033) and the replication study (*p* = .198), emphasizing the power advantage of this technique.

###### In Goal 5, researchers aim to show that a replication effect size is inconsistent with that of the original study. A simple difference in statistical significance is not suited for this goal. In fact, the difference between a statistically significant and nonsignificant finding is not necessarily statistically significant (Gelman & Stern, 2006). Rather, we encourage researchers to consider testing the difference in effect sizes between the two studies, using a confidence interval approach (e.g., Bonett, 2009). Although some authors declare a replication to be a failure when the replication effect size is smaller in magnitude than that reported by the original study, testing the difference in effect sizes for significance is a much more precise indicator of replication success in this situation. Specifically, a nominal difference in effect sizes does not imply that the effects differ statistically (Bonett & Wright, 2007).

###### Finally, Goal 6 involves showing that a replication effect is consistent with the original effect. In a combination of the recommended analyses for Goals 2 and 5, we recommend conducting an equivalence test on the difference in effect sizes.  Authors who declare their replication study successful when the effect size appears similar to the original study could benefit from knowledge of these analyses, as descriptively similar effect sizes may statistically differ.

###### We hope that the broader view of replication that we present in our article allows researchers to expand their goals for replication research as well as utilize more precise indicators of replication success and non-success. Although recent replication attempts have painted a grim picture in many fields, we are confident that the recent emphasis on replication will bring about a literature in which readers can be more confident, in economics, psychology, and beyond.

###### *Scott Maxwell is Professor and Matthew A. Fitzsimon Chair in the Department of Psychology at the University of Notre Dame. Samantha Anderson is a PhD student, also in the Department of Psychology at Notre Dame. Correspondence about this blog should be addressed to her at Samantha.F.Anderson.350@nd.edu.*

###### **REFERENCES**

###### Bonett, D. G. (2009). Meta-analytic interval estimation for standardized and unstandardized mean differences. *Psychological Methods, 14*, 225–238. doi:10.1037/a0016619

###### Bonett, D. G., & Wright, T. A. (2007). Comments and recommendations regarding the hypothesis testing controversy. *Journal of Organizational Behavior, 28*, 647–659. doi:10.1002/job.448

###### Braver, S. L., Thoemmes, F. J., & Rosenthal, R. (2014). Continuously cumulating meta-analysis and replicability. *Perspectives on Psychological Science, 9*, 333–342. doi:10.1177/1745691614529796

###### Chang, A. C., & Li, P. (2015). Is Economics Research Replicable? Sixty Published Papers from Thirteen Journals Say ”Usually Not”,” *Finance and Economics Discussion Series 2015-083*. Washington: Board of Governors of the Federal Reserve System, doi:10.17016/FEDS.2015.083

###### Gelman, A., & Stern, H. (2006). The difference between “significant” and “not significant” is not itself statistically significant. *The American Statistician, 60*, 328 –331. doi:10.1198/000313006X152649

###### Kruschke, J. K. (2011). Bayesian assessment of null values via parameter estimation and model comparison. *Perspectives on Psychological Science, 6*, 299–312. doi:10.1177/1745691611406925

###### Lane, D. M., & Dunlap, W. P. (1978). Estimating effect size: Bias resulting from the significance criterion in editorial decisions. *British Journal of Mathematical and Statistical Psychology, 31*, 107–112. doi:10.1111/j.2044-8317.1978.tb00578.x

###### Maxwell, S. E. (2004). The persistence of underpowered studies in psychological research: Causes, consequences, and remedies. *Psychological Methods, 9*, 147–163. doi:10.1037/1082-989X.9.2.147

###### Maxwell, S. E., Kelley, K., & Rausch, J. R. (2008). Sample size planning for statistical power and accuracy in parameter estimation. *Annual Review of Psychology, 59*, 537–563. doi:10.1146/annurev.psych.59.103006.093735

###### Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. *Science, 349,* aac4716. doi:10.1126/science.aac4716

###### Ortman, A. (2015, November 2). *The replication crisis has engulfed economics*. Retrieved from <http://theconversation.com/the-replication-crisis-has-engulfed-economics-49202>

###### Pashler, H., & Wagenmakers, E. J. (2012). Editors’ introduction to the special section on replicability in psychological science: A crisis of confidence? *Perspectives on Psychological Science, 7,* 528–530. doi:10.1177/1745691612465253

###### Prinz, F., Schlange, T., & Asadullah, K. (2011). Believe it or not: How much can we rely on published data on potential drug targets? *Nature Reviews Drug Discovery*, *10*, 712–713.

###### Rouder, J. N., & Morey, R. D. (2012). Default Bayes factors for model selection in regression. *Multivariate Behavioral Research, 47*, 877–903. doi:10.1080/00273171.2012.734737

### Share this:

* [Click to share on X (Opens in new window)
X](https://replicationnetwork.com/2017/02/28/anderson-maxwell-theres-more-than-one-way-to-conduct-a-replication-study-six-in-fact/?share=twitter)
* [Click to share on Facebook (Opens in new window)
Facebook](https://replicationnetwork.com/2017/02/28/anderson-maxwell-theres-more-than-one-way-to-conduct-a-replication-study-six-in-fact/?share=facebook)

Like Loading...
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
---
title: "AoI*: “: Comparing Human-Only, AI-Assisted, and AI-Led Teams on Assessing Research Reproducibility in Quantitative Social Science” by Brodeur et al. (2025)"
date: 2025-01-18
author: "The Replication Network"
tags:
- "GUEST BLOGS"
- "AI"
- "AI-assisted research"
- "AI-led analysis"
- "Artificial Intelligence"
- "Human vs AI collaboration"
- "Quantitative social science"
- "Reproducibility assessment"
draft: false
type: blog
---

*[\*AoI = “Articles of Interest” is a feature of TRN where we report abstracts of recent research related to replication and research integrity.]*

**ABSTRACT (taken from**[***the article***](https://www.econstor.eu/bitstream/10419/308508/1/I4R-DP195.pdf)**)**

“This study evaluates the effectiveness of varying levels of human and artificial intelligence (AI) integration in reproducibility assessments of quantitative social science research.”

“We computationally reproduced quantitative results from published articles in the social sciences with 288 researchers, randomly assigned to 103 teams across three groups — human-only teams, AI-assisted teams and teams whose task was to minimally guide an AI to conduct reproducibility checks (the “AI-led” approach).”

“Findings reveal that when working independently, human teams matched the reproducibility success rates of teams using AI assistance, while both groups substantially outperformed AI-led approaches (with human teams achieving 57 percentage points higher success rates than AI-led teams, 𝒑 < 0.001).”

“Human teams were particularly effective at identifying serious problems in the analysis: they found significantly more major errors compared to both AI-assisted teams (0.7 more errors per team, 𝒑 = 0.017) and AI-led teams (1.1 more errors per team, 𝒑 < 0.001). AI-assisted teams demonstrated an advantage over more automated approaches, detecting 0.4 more major errors per team than AI-led teams ( 𝒑 = 0.029), though still significantly fewer than human-only teams. Finally, both human and AI-assisted teams significantly outperformed AI-led approaches in both proposing (25 percentage points difference, 𝒑 = 0.017) and implementing (33 percentage points difference, 𝒑 = 0.005) comprehensive robustness checks.”

“These results underscore both the strengths and limitations of AI assistance in research reproduction and suggest that despite impressive advancements in AI capability, key aspects of the research publication process still require human substantial human involvement.”

**REFERENCE**

[Brodeur, Abel et al. (2025) : Comparing Human-Only, AI-Assisted, and AI-Led Teams on Assessing Research Reproducibility in Quantitative Social Science, I4R Discussion Paper Series, No. 195, Institute for Replication (I4R), s.l.](https://www.econstor.eu/bitstream/10419/308508/1/I4R-DP195.pdf)

### Share this:

* [Click to share on X (Opens in new window)
X](https://replicationnetwork.com/2025/01/18/aoi-comparing-human-only-ai-assisted-and-ai-led-teams-on-assessing-research-reproducibility-in-quantitative-social-science-by-brodeur-et-al-2025/?share=twitter)
* [Click to share on Facebook (Opens in new window)
Facebook](https://replicationnetwork.com/2025/01/18/aoi-comparing-human-only-ai-assisted-and-ai-led-teams-on-assessing-research-reproducibility-in-quantitative-social-science-by-brodeur-et-al-2025/?share=facebook)

Like Loading...
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
title: "AoI*: “Conventional Wisdom, Meta-Analysis, and Research Revision in Economics” by Gechert  et al. (2023)"
date: 2023-12-27
author: "The Replication Network"
tags:
- "GUEST BLOGS"
- "Conventional wisdom"
- "maer-net"
- "Meta-analysis"
- "publication bias"
draft: false
type: blog
---

*[\*AoI = “Articles of Interest” is a feature of TRN where we report excerpts of recent research related to replication and research integrity.]*

**EXCERPT (taken from the [article](https://www.econstor.eu/bitstream/10419/280745/1/Meta-analysis-review.pdf))**

“The purpose of this study is to compare the findings of influential meta-analyses to the ‘conventional wisdom’ about the same economic question or issue. What have we learned from meta-analyses of economics? How do their results differ from the conventional, textbook understanding of economics?”

“We identify ‘influential’ meta-analyses as those with at least 100 citations that were published in 2000 or later, and those that were recommended by a survey of members of the Meta-Analysis of Economics Research Network (MAER-Net)”

“Out of the full sample of 360 studies, 72 studies cover a general interest topic in economics and include original empirical estimates for a certain effect size. We narrow down further to those meta-analyses that provide both a simple mean of the original effect size and a corrected mean, controlling for publication bias or other biases. This gives us a final list of 24 studies covering the fields of growth and development, finance, public finance, education, international, labor, behavioral, gender, environmental, and regional/urban economics.”

“We compare the central findings of the meta-analyses to ‘conventional wisdom’ as classified by: (1) a widely recognized seminal paper or authoritative literature review; (2) the assessment of an artificial intelligence (AI), the GPT-4 Large Language Model (LLM); and (3) the simple unweighted average of reported effects included in the metaanalysis.”

“For 17 of these 24 studies, the corrected effect size is substantially closer to zero than commonly thought, or even switches sign. Statistically significant publication bias is prevalent in 17 of the 24 studies. Overall, we find that 16 of 24 studies show both a clear reduction in effect size and a statistically significant publication bias. Comparing the best estimate from the meta-analysis with the conventional wisdom from the reference study, the GPT-4 estimate, or the simple unweighted average, the relative reduction in the effect size is in the range of 45-60% in all three comparison cases.”

REFERENCE: [Gechert, S., Mey, B., Opatrny, M., Havranek, T., Stanley, T. D., Bom, P. R., … & Rachinger, H. J. (2023). Conventional Wisdom, Meta-Analysis, and Research Revision in Economics](https://www.econstor.eu/bitstream/10419/280745/1/Meta-analysis-review.pdf).

### Share this:

* [Click to share on X (Opens in new window)
X](https://replicationnetwork.com/2023/12/27/aoi-conventional-wisdom-meta-analysis-and-research-revision-in-economics-by-gechert-et-al-2023/?share=twitter)
* [Click to share on Facebook (Opens in new window)
Facebook](https://replicationnetwork.com/2023/12/27/aoi-conventional-wisdom-meta-analysis-and-research-revision-in-economics-by-gechert-et-al-2023/?share=facebook)

Like Loading...
Loading
Loading