Skip to main content

Table 5 Comparison of the five metrics

From: A review on process-oriented approaches for analyzing novice solutions to programming problems

Metric

Measure/criteria

Advantages

Disadvantages

EQ

Measures how many syntax errors a student encounters during a single programming session and how successive compilation failures in a session compare in terms of error message, location, and edit location

- Proved to correlate inversely with grades, hence, can be used to predict student performance - Can be an indicator of how well or poorly a student was progressing

- Dependent on parameter values - Has not explored the amount of time which a student takes to resolve an error - Assumes that students only work on a single source file, or work on multiple files in a linear manner, which is considered flawed when creating a set of consecutive compilation pairings - Vary across groups, environments, and contexts

WS

Uses time as a predictor to predict performance based on a how a student responds to different types of error compared to their peers

- Addressed the shortcoming of EQ by constructing pairings on a per-filename basis - Can be used to predict student performance even early in the course - Outperforms EQ as a predictive measure

- No measures are taken to check superficial changes made to source code can be incorrectly flagged as semantic changes

PDS

Measures probabilistic distance between an observed student solution and a correct solution using a Markov model

- Can be used to determine if an edit or student path is (a) typical of students who have mastered content, and (b) productive in progressing toward a solution - Can be adapted to focus on a model state transitions that indicate misconceptions or other model-based goals of the data miner

- A model of required algorithmic components must be identified first - Student evaluation is constrained based on the model used

NPSM

Characterizes students’ programming activity in terms of the dynamically changing syntactic and semantic correctness of programs

- Can be used to predict student performance by considering the programming process more holistically - A substantially better predictor than EQ and WS

- Provides only a rough proxy for determining semantic correctness (e.g., presence or absence of runtime exceptions in the last execution attempt)

RED

Quantifies repeated errors by looking at the amount of repeated error strings a student encounters, and the length of these strings

- Less context dependent - Useful for short sequences - Can be significantly reduced by an editor that has previously been shown to result in significantly fewer compiler errors

- Its bounded nature brings some questions when a large number of data points are involved - Prone to outliers - Has not been validated if it correlates with student performance