The NSF funded Big Data NorthEast Spoke is a project that raises interest in the merging of Big Data and education. We run workshops to teach Big Data methods, and host competitions to raise interest.

We focus on improving capacity in data-drive education by sharing educational databases, managing yearly data competitions, and conducting educational data science workshops and hackathons. Measurable results include studying gigabytes of data to: create actionable recommendations for classroom teachers; make effective and successful predictions about students; develop new artificial intelligence (AI) methods for education; and create new data science tool sets. Key outcomes include introducing many researchers to educational big data, learning analytics and models of teaching interventions.

This project will indirectly improve classroom learning and leverage the unique types of data available from digital education to better understand students, groups and the settings in which they learn. So far, we have run a competition on using ASSISTments Longitudinal data. 74 researchers from around the world participated, and papers were written by some of the winners and are being submitted for publication to the Journal of Educational Data Mining.

This research addresses several grand challenges in education: 1) Predict future student events, e.g., college attendance, college major, from existing large-scale longitudinal educational data sets involving the same thousands of students. 2) Help teachers to make sense of dense online data to influence their teaching, e.g., what should they say or do in response to student activity. 3) Provide personal instruction to each student based on using big data that represents student skills and behavior and infers students' cognitive, motivational, and metacognitive factors in learning.