top of page

Your certificate is now private

CertificateBackground.png

Certificate of Completion

ErdosHorizontal.png

THIS ACKNOWLEDGES THAT

HAS COMPLETED THE FALL 2025 DATA SCIENCE BOOT CAMP

Sam Schiavone

Roman Holowinsky, PhD

NOVEMBER 13, 2025

DIRECTOR

DATE

clear.png

TEAM

LingPredict Project: Do Developmental Norms Predict L2 Difficulty? Modeling Duolingo Learners with WordBank Features

Sara Sanchez-Alonso, Benard Haugen, Vikram Jambulapati, Manjeet Kaur, Sam Schiavone

clear.png

Field: Language Learning

Research Question:
Do words acquired later in first-language development (L1) show higher error rates in second language (L2) practice on Duolingo?

Duolingo SLAM Dataset:
• Large corpus of data from over 6,000 Duolingo users, collected during their first 30 days of learning a language
• Released in 2018 as part of the Duolingo Shared Task on Second Language Acquisition Modeling (SLAM).
• Publicly available via Harvard Dataverse and linked from Duolingo Research.
• Freely usable for research/educational purposes (requires agreeing to terms of use).

WordBank Dataset:
• Open database of children’s vocabulary growth.
• Publicly available at wordbank.stanford.edu.
• Aggregates data from the MacArthur–Bates Communicative Development Inventories (CDIs).
• Open access under a permissive license (for research/educational use).

Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL

©2017-2026 by The Erdős Institute.

bottom of page