Skip to main content

Table 3 Overview of the reviewed publications

From: Integrating multiple data sources for learning analytics—review of literature

Publication

Issue addressed in publication

Types of data sources

Types of data

Data sources integrated?

Data integration approach (manual or automatic)

Methods used

Records for how many participants analyzed

Lopez Guarin, Guzman, and Gonzalez (2015)

Predict the loss of academic status at a certain time

Student information system (× 2)

Student background information (× 2), performance test data, final grades

Yes

Automatic—first join admissions data sets into one table. Then join with academic information

Decision trees, naive Bayes

1532 students

Park, Yu, and Jo (2016)

Classify blended learning courses in a Korean higher education institution

LMS (× 2)

Activity log, course data

Yes

Automatic (most likely—not explicitly stated). Combine course data and log data on course ID (anonymized)

Latent class analysis

N/A (Records regarding 612 courses which were found suitable for analysis)

Thompson, Kennedy-Clark, Wheeler, and Kelly (2014)

Automatic tagging of text part of speech; for the identification of types of micro-events that learners enact, and the determination of whether learners complete functions that are crucial for task success

Corpora (× 2) (both are mini-corpora of collaborative problem-based learning activities)

Text (× 2)

No (data are analyzed separately)

Data are not integrated

Part of speech tagger—trained on Penn Tree Bank corpora. Visualization of timing and speaker for each utterance in one mini corpora.

Corpora 1: total 6 dyads (12 students + teacher)

Corpora 2: four postgraduate students

Zheng, Bender, and Nadershahi (2017)

Data were extracted from tools to provide data on faculty’s application of digital tools and to assess the impact of the lecture annotation tool on students’ learning behavior

LMS, lecture annotation tool

Activity log data (× 2)

No (data are analyzed separately)

Data are not integrated

N/A

N/A

Pardos and Kao (2015)

Bayesian network analysis to assess student current and prior knowledge for problems in a MOOC (with visualization); and visualization of course structure (not based on preceding analysis)

MOOC (× 2)

Activity log, student background information (possibly more)

No, platform can currently only integrate EdX MOOC data with other EdX MOOC data. Platform also supports Coursera

Automatic for integrating EdX MOOC data with other EdX data (for Coursera MOOC data this is not addressed). Approach: use HarvardX tool to integrate different types of EdX files into one csv file (loosely based on xAPI). For visualizations: read csv file(s) into memory

Bayesian network analysis, visualization

N/A

Liu et al. (2017)

Examine use of an adaptive system through analysis of usage patterns

Student information system, adaptive platform, LMS, performance test

Student background information, activity log, performance test data (× 3)

Yes

N/A (publication explicitly mentions combination of data, yet does not specify how)

Spearman correlation, visualizations, regression analyses

128 first-year students entered into pharmacy program

Raca, Tormey, and Dillenbourg (2016)

Compare student behaviors (levels of movement) and connect with attention (self-reported)

Video, questionnaire

Video-derived data, questionnaire data

Yes

N/A

Descriptive statistics (e.g., mean, percentage), correlations

56 bachelor level students

Di Mitri et al. (2017)

Predict learners performance during self-regulated learning

Physiological signals wristband, software tracking tool, questionnaire, weather information

Physiological arousal data, software category, questionnaire data, location data, weather data

Yes

Automatic. A tool (Learning Pulse Server) imports data from different APIs and stores events in a Learning Record Store (xAPI format)

Linear mixed effects models

9 PhD students (the multimodal data set originally contained approximately 10,000 records)

Ochoa et al. (2018)

Provide automatic feedback on oral presentation skills

Video, audio, presentation slide

Video derived data, audio derived data, presentation slide derived data

No (data are analyzed separately)

Data/data sources are not integrated

Various classification algorithms (e.g., random forest)

83 engineering students

Hutt et al. (2017)

Detect mind wandering during a lecture using eye tracking

Eye tracker, questionnaire

Eye tracker data, questionnaire data

Yes

N/A

Bayesian network classifier

32 undergraduate students from a Canadian university

Jayaprakash, Moody, Lauría, Regan, and Baron (2014)

Detect students who are in academic difficulty

LMS, student information system

Activity log data, partial course grades, course data, student background information (× 2)

Yes

Automatic. Uses Pentaho Business Intelligence Data Integration (ETL approach)

Logistic regression, support vector machines, J48, naive Bayes

15,150 undergraduate students

Rodríguez-Triana, Prieto, Martínez-Monés, Asensio-Pérez, and Dimitriadis (2018)

Identify deviations between the desired learning state (based on learning design) and the actual state in blended/CSCL scenarios

LMS, wiki, online writing application, attendance list, human observation, instructional design information, questionnaire

Activity log data (× 2), attendance information, teacher comments, instructional design information, questionnaire data

Yes

Automatic (at least in part). Third-party tools were integrated into virtual learning environment (GLUE)

N/A (three binary classifiers were built to identify deviations between desired learning state and actual state)

165 students

Gray, McGuinness, Owende, and Hofmann (2016)

Predict at-risk students

Student information system, questionnaire, exam results

Student background information, questionnaire data, GPA

Yes

N/A

Correlations, t test/ANOVA. Classification (e.g., naive Bayes, decision trees)

1207 first-year students (records from 2010 to 2012)

Wang, Paquette, and Baker (2014)

Identify career path for MOOC learners

MOOC, organization member information

Student background information, questionnaire, organization member information

Yes (partly, questionnaire is analyzed separately)

N/A (most likely manual)

Chi-square, descriptive statistics

N/A (536 MOOC participants answered questionnaire)

Mangaroska, Vesin, and Giannakos (2019)

Predict student performance

E-learning portal (× 2), Integrated Development Environment (IDE)

Performance test data, activity log data (× 3)

Yes

Automatic. System collects and aggregates data from different sources. Data are integrated in a Learning Record Store

Descriptive statistics, Spearman correlation, linear regressions, visualization

21 (one teacher and 20 computer science students)

Villano, Harrison, Lynch, and Chen (2018)

Examine the relationship between student retention and an early alert system (controlling for a number of variables)

Student information system, early alert system

Student background information, final grades, workload, school data (e.g., location, fee), early alert system data

Yes

Automatic. University collects and integrates data from different IT systems in a data warehouse

Survival analysis

N/A (16,142 records captured from 2011 to 2013 were analyzed)

Wong, Kwong, and Pegrum (2018)

Examine if an augmented reality app for integrity and ethics can help change student’s perspectives on these subject matters

AR platform, LMS

Activity log data, text (× 2)

No (data are analyzed separately)

Data/data sources are not integrated

Descriptive statistics, text analysis, visualization

N/A (1259 students participated, but not all participants’ data were included in the subsequent analyses)

Sandoval, Gonzalez, Alarcon, Pichara, and Montenegro (2018)

Prediction of students who are at risk of failing classes

Student information system, LMS

Student background information, final grades, activity log data

Yes

Automatic. Extract data from data sources and encrypt, then re-codify some of the attributes into similar types before integrating in a relational database

Linear regressions, random forest

21,314 students (over three semesters)

Sun, Xie, and Anderman (2018)

Examine the effect of self-regulation on academic achievement in flipped classrooms

Questionnaire, LMS

Questionnaire data, performance test data, partial course grades

Yes

Manual. Combine grades obtained from instructors with survey data

Structural equation modeling, multi-level regression

151 US undergraduate students

Giannakos, Sharma, Pappas, Kostakos, and Velloso (2019)

Examine if including physiological sensing data provides advantages for predicting skill acquisition (and more generally for the design of learning technologies)

Eye tracker, physiological signals wristband, EEG cap, video, game

Eye tracker data, physiological arousal data, EEG data, video derived data, activity log data, performance test data

Yes

Automatic. The features for each data source were extracted separately, then data were integrated using R

LASSO regression, random forest, ANOVA

17 participants from a major European university