Automating Effect Size Extraction from Research Articles

Build an end-to-end pipeline that finds the relevant predictor–outcome effect size(s) for a research question, converts to Pearson’s r when needed, and outputs a single aggregate effect size per study.

Competition Overview

Conducting meta-analysis is time consuming across psychology because manually coding effect sizes from articles is complex and open-ended. Studies often report many scales that can map onto multiple constructs, so meta-analysts must carefully determine which reported relationship matches a given research question and then extract the correct effect size.

This year's SIOP Machine Learning Competition focuses on automating the article coding process: given a set of articles and a corresponding research question, your pipeline must extract the relevant predictor–dependent variable relationship effect size(s), convert them to Pearson’s r if needed, and report a single aggregated effect size per paper.

Competition format: Competitors receive a development dataset to build and test their pipeline and, during the final week, a test dataset to generate official predictions.

Key Dates & Submission Limits

Date Event Submission Limits
3/7 Competition begins @ 5pm Eastern & Dev dataset released (5 submissions/day) (100 total submissions)
4/4 Test dataset released @ 5pm Eastern (3 total submissions)
4/11 Competition ends @ 11:59pm Eastern
4/12 Winners notified via email
4/13 Winning solution verification begins
4/17 Winning solution verification completes
4/30 Winning solutions presented at the 2026 SIOP conference

Scoring

Each study might have multiple effect sizes that pertain to the effect size relevant to the research question. For example, Study 1 used Scale A to measure the predictor and Scales B, C, and D to measure the criterion. The effect sizes are rAB, rAC, and rAD. The average of those observed correlations is the true aggregate score. Your pipeline will be predicting the true aggregate score by averaging across effects it extracts. Pipelines are evaluated as whole using the mean squared error (MSE) between predicted and true aggregate scores:

$$ \mathrm{MSE} = \frac{1}{N}\sum_{i=1}^{N}\left(\hat{r}_i - r_i\right)^2 $$

Submission Format (CSV)

Submissions must be uploaded as a CSV file containing exactly two columns: studyid and aggregateeffectsize. Each row should correspond to one study, where aggregateeffectsize is your predicted aggregate Pearson’s r for that study.

Example CSV:

studyid,aggregateeffectsize
    study1,0.23
    study2,-0.11
    study3,0.00
Tip: Ensure the header names match exactly (studyid, aggregateeffectsize) and that every study in the dataset appears once.

Eligibility

Design Constraints

Participants design a function (in a programming language of their choice) that accepts:

  1. PDF file name
  2. Research question description (Must be the same for all papers examining the same research question)
  3. Predictor description (Must be the same for all papers examining the same research question)
  4. Dependent variable description (Must be the same for all papers examining the same research question)

The function must then (fully computationally and automatically):

Any approach is allowed as long as it:

Instructions

  1. Create a Team: Register on the Sign-up Page. If your site uses team tokens, keep your token handy for downloads and submissions.

  2. Download the Dev Dataset (starting 3/7) @ 5pm Eastern:

    Enter your team token below and download the Dev dataset.


  3. Build Your Pipeline: Implement a single-file function that takes a PDF + research question + predictor/criterion descriptions, extracts the relevant relationship(s), converts to Pearson’s r when needed, and outputs one aggregate r per study.

  4. Submit Dev Predictions (3/7–4/4): Submit a csv file with study IDs and predicted aggregate effect sizes. Dev submissions are limited to 5 per day and 100 total. Use the Submission Page to upload.

  5. Download the Test Dataset (available 4/4 @ 5pm Eastern):

    Enter your team token below and download the Test dataset.

    If you have issues accessing any of the articles, please send a message to ivanhernandez@vt.edu

  6. Submit Test Predictions (3/7–4/4): Submit a csv file with study IDs and predicted aggregate effect sizes. Test submissions are limited to 3 total. Submit carefully. No exceptions will be made to provide contestant with additional submissions under any circumstances. Use the Submission Page to upload.

  7. Final Submission Deadline: Ensure all official submissions are complete by 4/11. Winners will be notified by email on 4/12 and solutions will undergo verification afterward.
Sign Up Your Team Now