k12eval.org
Free resource · living index

Open datasets for AI grading research, in one place.

There is no single home for the open K-12 essay and writing corpora that AI grading research is built on. Datasets live on Kaggle, Figshare, GitHub, ArXiv, lab sites, and state vendors. K12Eval maintains this list as a public service. Free to use. Credited to original authors. Updated as new corpora drop.

11
Datasets indexed
60K+
Scored responses covered
EN / PT
Languages represented
How this list is curated: a corpus is included if it contains scored student-written responses, is publicly downloadable (or accessible upon request from the original host), and has been used in published or community AI grading research. K12Eval does not redistribute the data. Each row links you to the original source.
01
LEAR Lab Dataset
LEAR Lab
EssaysWritingRubric-scored
02
DREsS Dataset
DREsS (ArXiv 2402.16733)
EssaysEFL / ELLRubric-scored
03
ASAP Dataset
Automated Student Assessment Prize (Kaggle / Hewlett)
EssaysK-12WritingRubric-scored
04
ASAP (Figshare Pre-Processed)
Figshare
EssaysWritingRubric-scored
05
Automated Essay Grading (GitHub)
NishantSushmakar on GitHub
EssaysCode / baselineWriting
06
ELL Feedback Prize (Kaggle)
Kaggle / Learning Agency Lab
EssaysK-12EFL / ELLTrait-scored
07
Learning Agency Lab — Automated Essay Scoring 2.0
The Learning Agency Lab (Kaggle competition)
EssaysK-12WritingRubric-scored
08
Essay-BR DatasetBonus
Catalpa-CL
EssaysK-12BilingualWriting
09
IELTS Writing Scored Essays (Kaggle)
Kaggle
EssaysWritingRubric-scoredHigh-stakes
10
PARCC Released Items (New Meridian)
New Meridian
K-12High-stakesRubric-scored
11
ASAP 2.0 Corpus
scrosseye on GitHub
EssaysK-12WritingRubric-scored

Each dataset is credited to its original authors; K12Eval only curates and links. If a link breaks or a corpus moves, please let us know.

Suggest a dataset
Beyond the index

This list is the starting point. K12Eval is building what comes next.

The aggregated list above is a free public service. The rest of K12Eval, the production corpus, the IRR-validated subsets, the shared methodology, and the public leaderboard, picks up where these legacy datasets stop.

© 2026 The K12Eval Project · Convened by cograder · Public benefit research initiative