SIOP ML Competition - Instructions

Competition Overview

Conducting meta-analysis is time consuming across psychology because manually coding effect sizes from articles is complex and open-ended. Studies often report many scales that can map onto multiple constructs, so meta-analysts must carefully determine which reported relationship matches a given research question and then extract the correct effect size.

This year's SIOP Machine Learning Competition focuses on automating the article coding process: given a set of articles and a corresponding research question, your pipeline must extract the relevant predictor–dependent variable relationship effect size(s), convert them to Pearson’s r if needed, and report a single aggregated effect size per paper.

Competition format: Competitors receive a development dataset to build and test their pipeline and, during the final week, a test dataset to generate official predictions.

Key Dates & Submission Limits
        
              Date
              Event
              Submission Limits
            
              3/7
              Competition begins @ 5pm Eastern & Dev dataset released
              (5 submissions/day) (100 total submissions)
            
              4/4
              Test dataset released @ 5pm Eastern
              (3 total submissions)
            
              4/11
              Competition ends @ 11:59pm Eastern
              —
            
              4/12
              Winners notified via email
              —
            
              4/13
              Winning solution verification begins
              —
            
              4/17
              Winning solution verification completes
              —
            
              4/30
              Winning solutions presented at the 2026 SIOP conference
              —

Date	Event	Submission Limits
3/7	Competition begins @ 5pm Eastern & Dev dataset released	(5 submissions/day) (100 total submissions)
4/4	Test dataset released @ 5pm Eastern	(3 total submissions)
4/11	Competition ends @ 11:59pm Eastern	—
4/12	Winners notified via email	—
4/13	Winning solution verification begins	—
4/17	Winning solution verification completes	—
4/30	Winning solutions presented at the 2026 SIOP conference	—

Scoring

Each study might have multiple effect sizes that pertain to the effect size relevant to the research question. For example, Study 1 used Scale A to measure the predictor and Scales B, C, and D to measure the criterion. The effect sizes are r_AB, r_AC, and r_AD. The average of those observed correlations is the true aggregate score. Your pipeline will be predicting the true aggregate score by averaging across effects it extracts. Pipelines are evaluated as whole using the mean squared error (MSE) between predicted and true aggregate scores:

$$ \mathrm{MSE} = \frac{1}{N}\sum_{i=1}^{N}\left(\hat{r}_i - r_i\right)^2 $$

Submission Format (CSV)

Submissions must be uploaded as a CSV file containing exactly two columns: studyid and aggregateeffectsize. Each row should correspond to one study, where aggregateeffectsize is your predicted aggregate Pearson’s r for that study.

studyid: The unique ID for the study provided in the dataset.
aggregateeffectsize: Your predicted aggregate effect size (Pearson’s r) for that study.

Example CSV:

studyid,aggregateeffectsize
    study1,0.23
    study2,-0.11
    study3,0.00

Tip: Ensure the header names match exactly (studyid, aggregateeffectsize) and that every study in the dataset appears once.

      Eligibility
      Participants of any academic level, professional status, or field may compete.
At least one team member must be able to attend the 2026 SIOP Annual Conference in person.
Teams may compete together, but cannot change teams or join new teams after creating an account.

          Teams may add members only if the added member has not created a team and has not submitted any entries.
        
Submissions under false identities or multiple accounts will result in immediate disqualification.

    

Design Constraints

Participants design a function (in a programming language of their choice) that accepts:

PDF file name
Research question description (Must be the same for all papers examining the same research question)
Predictor description (Must be the same for all papers examining the same research question)
Dependent variable description (Must be the same for all papers examining the same research question)

The function must then (fully computationally and automatically):

Extract examples of the predictor–outcome relationship from the paper.
Convert the extracted effect size(s) to Pearson’s r (if needed), and invert relationship (if construct reverse coded).
Average the results into a single aggregate effect size for that paper.

Any approach is allowed as long as it:

Is programmed in a single file (excluding imports and remote library installations).
Uses freely shareable code with no license or copyright restrictions.
Uses models/APIs that are publicly available and accessible (APIs may be paid/token-based).
Uses the same research question, predictor, and outcome descriptions for all papers belonging to the same research question.
If you pretrain a custom model, it must be remotely installable (e.g., hosted on Hugging Face, CRAN, or GitHub).

Instructions

Create a Team: Register on the Sign-up Page. If your site uses team tokens, keep your token handy for downloads and submissions.

Download the Dev Dataset (starting 3/7) @ 5pm Eastern:
Enter your team token below and download the Dev dataset.

Team Token:

Build Your Pipeline: Implement a single-file function that takes a PDF + research question + predictor/criterion descriptions, extracts the relevant relationship(s), converts to Pearson’s r when needed, and outputs one aggregate r per study.

Submit Dev Predictions (3/7–4/4): Submit a csv file with study IDs and predicted aggregate effect sizes. Dev submissions are limited to 5 per day and 100 total. Use the Submission Page to upload.

Download the Test Dataset (available 4/4 @ 5pm Eastern):
Enter your team token below and download the Test dataset.

Team Token:

If you have issues accessing any of the articles, please send a message to ivanhernandez@vt.edu

Submit Test Predictions (3/7–4/4): Submit a csv file with study IDs and predicted aggregate effect sizes. Test submissions are limited to 3 total. Submit carefully. No exceptions will be made to provide contestant with additional submissions under any circumstances. Use the Submission Page to upload.

Final Submission Deadline: Ensure all official submissions are complete by 4/11. Winners will be notified by email on 4/12 and solutions will undergo verification afterward.

Automating Effect Size Extraction from Research Articles

Competition Overview

Key Dates & Submission Limits

Scoring

Submission Format (CSV)

Eligibility

Design Constraints

Instructions