Student placement and skill ranking predictors for programming classes using class attitude, psychological scales, and code metrics

Ishizue, Ryosuke; Sakamoto, Kazunori; Washizaki, Hironori; Fukazawa, Yoshiaki

doi:10.1186/s41039-018-0075-y

Research
Open access
Published: 28 June 2018

Student placement and skill ranking predictors for programming classes using class attitude, psychological scales, and code metrics

Ryosuke Ishizue ORCID: orcid.org/0000-0001-8955-8323¹,
Kazunori Sakamoto²,
Hironori Washizaki¹ &
…
Yoshiaki Fukazawa¹

Research and Practice in Technology Enhanced Learning volume 13, Article number: 7 (2018) Cite this article

5589 Accesses
17 Citations
1 Altmetric
Metrics details

Abstract

In some situations, it is necessary to measure personal programming skills. For example, often students must be divided according to skill level and motivation to learn or companies recruiting employees must rank candidates by evaluating programming skills through programming tests, programming contests, etc. This process is burdensome because teachers and recruiters must prepare, implement, and evaluate a placement examination. This paper tries to predict the placement and ranking results of programming contests via machine learning without such an examination. Explanatory variables used for machine learning are classified into three categories: Psychological Scales, Programming Tasks, and Student-answered Questionnaires. The participants are university students enrolled in a Java programming class. One target variable is the placement result based on an examination by a teacher of a class and the ranking results of the programming contest. Our best classification model with a decision tree has an F-measure of 0.912, while our best ranking model with an SVM-rank has an nDCG of 0.962. In both prediction models, the best explanatory variable is from the Programming Task followed in order by Psychological Sale and Student-answered Questionnaire. Our classification model uses 9 explanatory variables, while our ranking model uses 20 explanatory variables. These include all three types of explanatory variables. The source code complexity, which is a source code metrics from Programming Task, shows best performance when the prediction uses only one explanatory variable. Contribution (1), this method can automate some of the teacher’s workload, which may improve educational quality and increase the number of acceptable students in the course. Contribution (2), this paper shows the potential of using difficult-to-formulate information for an evaluation such as a Psychological Scale is demonstrated. These are the contributions and implications of this paper.

Introduction

Sometimes it is necessary to measure a person’s programming skills^{Footnote 1}. For example, in education, often students must be divided into advanced and intermediate classes based on skill level, motivation to learn, etc. As another example, a company recruiting and placing employees must rank candidates by evaluating programming skills through programming tests, programming contests, etc. However, these processes are burdensome because the evaluator (teacher or recruiter) must prepare, implement, and assess the examination (e.g., placement test or questionnaire regarding class level) to determine programming ability. Moreover, when a teacher conducts such a questionnaire, the interpretation is subjective, which can cause problems in a class with two or more assigned teachers or if the teacher changes. Several other problems may also exist. For example, some students only memorize the answers of past examinations, while other students cram the night before a test.

This paper aims to properly place or rank students using an easier method than the traditional time-consuming examination. We focus on a class for second-year undergraduate students learning to program in Java at Waseda University. In this class, students participate in a programming contest at their department’s orientation about a month after the semester begins. The purpose of the contest is to increase student’s interest in programming. However, the contest is designed to evaluate programming skills. Around the same time as the contest, students are divided into an advanced class and an intermediate class according to the placement examination by the teacher.

In this paper, we try to substitute the examination with a questionnaire, which asks students about their class attitude, and the results of a Programming Task in the class. This information is then used to create a machine-learning model to predict the placement results as well as the ranking results of the programming contest. Three explanatory variables are used: (1) Programming Task, (2) Student-answered Questionnaire, and (3) Psychological Scale. The Programming Task evaluates student’s objective class attitudes and degree of understanding. The Student-answered Questionnaire gages student’s subjective class attitudes and understanding as a self-assessment based on experiences within class hours. The Psychological Scale indicates student’s self-assessment based on experiences outside class hours.

The Psychological Scale affects student’s academic performance Duckworth et al. (2007) and Duckworth and Gross (2014), and Duckworth and Quinn (2009). Previous studies have clearly employed Programming Task and Student-answered Questionnaire in the evaluation. They also demonstrate the relationship of these two variables with the Psychological Scale. However, the Psychological Scale is not used as an evaluation criterion. Thus, this study employs three sets of explanatory variables.

Figure 1 shows our two prediction models. The best classification model to predict the placement results, which we created with a decision tree, has a precision of 0.943, recall of 0.908, and an F-measure of 0.912. The best ranking model to predict the ranking results of the programming contest, which we created with an SVM-rank, has an nDCG of 0.962. Additionally, we evaluated the effects of the explanatory variables on the placement results and the programming contest. We investigated 9 factors affecting the placement results and 20 factors affecting the ranking results of the programming contest.

The contributions of this paper are:

This method can automate some of the teacher’s work, which may improve the quality of the lessons and increase the number of acceptable students.
The evaluation shows that the model changes the students’ class attitude.

Related work

Methods to support education are mainly divided into student support and teacher support. Many studies have focused on student support in programming education such as the visualization program execution status (Ishizue et al. 2017b; 2018) and a method to learn a language based on another language already learned (Li et al. 2017). This study focuses on teacher support.

How are students’ programming skills traditionally assessed?

Traditionally, students’ programming skills are assessed by whether they can solve Programming Tasks. McCracken et al. (2001) surveyed a multi-national, multi-institutional study of assessments of programming skills of first-year CS students. They defined the general evaluation (GE) criteria and the degree of closeness (DoC) evaluation criteria. The GE criteria objectively assess how accurately students implement their solutions. The DoC criteria subjectively evaluate the results of the abstraction and transformation generated sub-problems into sub-solutions.

The GE criteria consist of:

Execution: Does the program execute without errors? (30 points)
Verification: Does the program correctly produce answers to the benchmark data set? (60 points)
Validation: Does the program represent what is asked for in the exercise specifications? (10 points)
Style (Optional): Does the style of the program conform to local standards? (10 points)

The total number of points is considered to represent importance.

The DoC criteria consist of:

1
Does the program compile and work?
2
Is part or all of the method missing?
3
Are there meaningful comments, stub code, etc.?
4
Does the source code complete little of the program?
5
Does the source code show that the student has no idea about how to approach the problem?

The results of the programming contest are also used to assess programming skills. Trotman and Handley (2008) indicated that programming contests with automated assessments have become popular activities for training of programming skills. Verdú et al. (2012) also indicated that competition is a very important element since the combination of a contest with an automated assessment provides the educational community with an effective and efficient learning tool in the context of teaching programming.

Can additional variables be used to predict programming skills?

We investigated explanatory variables that can predict general academic skills not only for programming. Prior studies indicate that the Psychological Scale may be an explanatory variable.

We use famous Psychological Scales as explanatory variables in machine learning. The following scales are thought to affect academic performance. Deci and Ryan (1985, 2002) studied intrinsic motivation in human behavior. They defined intrinsic motivation as the life force or energy for an activity and the development of an internal structure. The degree of self-efficacy affects the efficiency of such behavior. According to Bandura (1977), self-efficacy expectancies determine the initial decision to perform a behavior, the effort expended, and the persistence in the face of adversity. Sherer et al. (1982) developed a self-efficacy scale.

The task value is a scale focusing on the value aspect of motivation. According to Eccles and Wigfield (1985), the task value is divided into three subscales (interest value, attainment value, and utility value). Moreover, Ida (2001) further divided attainment value and utility value into two for a total of five subscales. The attainment value is divided into the private attainment value, which is an internal absolute standard that varies by individual, and the public attainment value, which focuses on the superiority/inferiority with others. The utility value is divided into the institutional utility value, which is used when learning is necessary to pass an examination for employment or admission, and practical utility value, which is used when learning is useful in occupational practice. Ida (2001) also proposed a task value evaluation scale.

According to Duckworth and Gross (2014) and Duckworth et al. (2007), and Duckworth and Quinn (2009), self-control is needed to achieve goals that require long-term effort. Self-control allows one to focus on a goal (consistency of interest) and persevere through difficulties (perseverance of effort). They called this combination Grit, and developed an evaluation scale.

Goal orientation is divided into three subscales: mastery orientation, performance approach, and performance avoidance. Elliot and Church (1997) examined their influences and factors.

Ota (2010), Ryckman et al. (1990, 1996), and Smither and Houston (1992) developed a multi-dimensional competitiveness. Multi-dimensional competitiveness is divided into three subscales: instrumental competitiveness, avoidance of competition, and never-give-up attitude. Specific questions based on these scales are shown in Section 3.2.1.

Some studies have investigated these Psychological Scales and learning. For example, Robbins et al. (2004) examined the relationship between psychosocial and study skill factors (PSFs) and college outcomes. They found that the best predictors for grade point average (GPA) are academic self-efficacy and achievement motivation. Shen et al. (2007) investigated the influence of a mastery goal, performance-approach goal, avoidance-approach goal, individual interest, and situational interest on students’ learning of physical education. They reported that a mastery goal is a significant predictor to recognize of situational interest.

We have used class attitude as an explanatory variable for machine learning. Class attitude is also thought to affect the understanding of class content. For example, Saito et al. (2017) studied the relationship between attitudes and understanding of programming with an emphasis on the differences between text-based and visual-based programming.

How is machine learning previously used in relevant areas?

In this paper, we use classification and ranking machine learning. Various fields, including education, have used machine learning.

Some studies actually predict students’ grades or scores by machine learning. Okubo et al. (2017) studied a method to predict students’ final grades using a recurrent neural network (RNN) and a time series of learning activities logs (e.g., attendance, quiz, and report) in multiple courses. Yasuda et al. (2016) proposed an automatic scoring method for a conversational English test using automatic speech recognition and machine learning techniques.

Some studies use machine learning to find students who need assistance. Ahadi et al. (2015) and Castro-Wunsch et al. (2017) propose methods to automatically identify students in need of assistance. They predict such students using students’ source code snapshot data by machine learning approaches such as decision trees. Hong et al. (2015) implemented a function to the learning system called SQL-Tutor, which identifies students who will abandon the programming task and provides encouragement by displaying motivational messages.

Additional studies have investigated dropouts. Kotsiantis et al. (2003) proposed a prototype web-based support tool using a Naive Bayes algorithm, which can automatically recognize students with a high probability of dropping out. Márquez-Vera et al. (2016) predicted the high school dropout rates of students at different steps in a course to determine the best indicators for dropping out.

It takes time and effort to appropriately categorize students as class size increases. Sohsah et al. (2016) classified educational materials in low-resource languages with machine learning. Machine learning is used not only for teachers but also for school cost problems. Jamison (2017) tried to solve the problem of a declining enrollment rate of students accepted at a given college or university due to academic, economic, and logistical reasons by machine learning.

This paper uses three different kinds of explanatory variables. Such a dataset is called multi-view or multi-source data. Machine learning dealing with this kind of data is called multi-view learning. According to the latest survey of Zhao et al. (2017), multi-view learning has made great advances in recent years. Multi-view learning is machine learning that considers learning from multiple views to improve the general performance. Although this paper uses a traditional method, if this method is applied, our machine learning model may further improve the performance in the future.

Method

We used supervised machine learning to predict students’ placement results (classification problem) and the ranking results of the programming contest (ranking problem) for a Java programming class at Waseda University. Three sets of explanatory variables were employed: (1) Psychological Test, (2) Programming Task, and (3) Class Questionnaire. Then, we found the best algorithm and the best combination of explanatory variables. The results were used to create and evaluate models for both problems. For the classification problem, we used a Python library called malss (https://github.com/canard0328/malss) for machine learning developed by Kamoshida and Sakamoto (2016). For the ranking problem, we used a C language library called SVM^rank (www.cs.cornell.edu/people/tj/svm_light/svm_rank.html) for rank learning. The ranking SVM algorithm was developed by Joachims (2002, 2006). The ranking SVM learns by minimizing the error of the order relation when comparing each element with the correct order by making a set of two elements in the sample. We used this program because it is free for scientific use, and we expected the calculation to be fast because the program is written in C language. We published a program to create and evaluate models using these libraries on the following webpages:

This paper investigated the following research questions (RQs):

RQ1: How much does each explanatory variable predict the placement results?
RQ2: How much does each explanatory variable predict the programming contest ranking?
RQ3: What is the best combination of explanatory variables to predict the placement results?
RQ4: What is the best combination of explanatory variables to predict the ranking results of the programming contest?

Participants

This study included 65 second-year undergraduate students at Waseda University (Japan) enrolled in a Java programming class. This class is equivalent to a CS1 level. In this class, students participate in a programming contest at their department’s orientation about a month after the semester begins. The contest is designed to increase students’ interest in programming. Around the same time as the contest, students complete a placement test. Additionally, the students engage in Programming Tasks, answer a Psychological Test, and complete a questionnaire about the class. Then, the students are placed in either an advanced or intermediate course.

Of the participants, 50 students were placed in the advanced course and 15 students were placed in the intermediate course.

Explanatory variables

Three materials were prepared as explanatory variables in machine learning: (1) Psychological Scale, (2) Programming Task, and (3) Class Questionnaire.

1. Psychological Scales

Participants completed a psychological test. Table 1 shows the questions. Each question was evaluated on a seven-level scale: 97) Strongly Agree, (6) Agree, (5) Somewhat Agree, (4) Neutral, (3) Somewhat Disagree, (2) Disagree, and (1) Strongly Disagree.

Table 1 Psychological questions

Full size table

Table 2 shows the Psychological Scales corresponding to each question. Question 1 measured intrinsic motivation. Question 2 measured self-efficacy. We used simple statements such as “I like ∼.”, and “I am good at ∼.”. Questions 3 to 10 were based on the task value scale (Eccles and Wigfield 1985). We used question statements developed by Ida (2001). Questions 11 to 18 were based on the Short Grit Scale (Duckworth and Gross 2014; Duckworth and Quinn 2009; Duckworth et al. 2007). We used question statements developed by Nishikawa et al. (2015). Questions 19 to 24 were based on goal orientation (Tanaka and Yamauchi 2000). Questions 25 to 31 were based on multi-dimensional competitiveness (Ota 2010).

Table 2 Psychological Scales of each question

Full size table

2. Programming task

Each Programming Task was from the Aizu Online Judge (AOJ). AOJ is the most famous Online Judging System in Japan. AOJ has many programming problems, ranging from simple ones such as “Hello World” to difficult ones such as ACM-ICPC (https://icpc.baylor.edu/) previous problems. When a user submits his or her program source code via the submission form on the AOJ website, the correctness of the program is verified by executing it on the server side. Table 3 lists the IDs and names of the problems used. Additionally, we ranked each problem according to the difficulty by considering the content and the correct answer rate. A larger number indicates a more difficult level. Moreover, we measured the source code metrics, which students submitted to AOJ. To collect their source code, we used Nightmare, which is a high-level browser automation library written in JavaScript. To measure the metrics, we used Checkstyle, which is a static analysis tool for Java. Due to the simple APIs of each library, an automatic measurement program with 100 to 200 LOC can be easily derived. The maximum values determined by Checkstyle’s default were used.

Table 3 Problem id, name, and difficulty of Programming Task of AOJ (all problems are from http://judge.u-aizu.ac.jp/onlinejudge/description.jsp?lang=en? id=ProblemID)

Full size table

The following metrics were used to detect if the maximum value was exceeded:

(1) Is Solved, (2) LOC, (3) Boolean Expression Complexity, (4) Class Data Abstraction Coupling, (5) Class Fan Out Complexity, (6) Cyclomatic Complexity, (7) Executable Statement Count, (8) Max Len file, (9) Max Len method, (10) Max Line Len, (11) Max Outer Types, (12) Max Param, (13) NCSS Class, (14) NCSS File, (15) NCSS Method, (16) Npath Complexity, and (17) Too Many Methods.

3. Questionnaire about the class

We implemented a questionnaire about the class. This questionnaire was created to obtain a subjective evaluation of the students themselves. Participants completed the questionnaire in the class immediately after the placement test. Table 4 shows the questions. All questions were evaluated on a seven-level scale. These questions were created based on the end-of-term questionnaire that Waseda University employs for all classes.

Table 4 Questionnaire about the class

Full size table

Objective variables

There were two kinds of objective variables: the placement results and the ranking results of programming contest. We predicted each objective variable using the explanatory variables.

Placement results

Table 5 shows the examination sentences of the assignment test (programming quiz). The programming quiz took 90 min. The quiz also asked each student about preferred class placement: advanced or intermediate (Hope Class). Although the examination result was not used as an explanatory variable for machine learning, it was used by the teacher for class placement. The examination result is used only for sample labeling.

Table 5 Examination programming quiz

Full size table

Ranking results of the programming contest

Table 6 shows the description sentences of the programming contest. The contest time was 90 min. Each problem was given a maximum score. When a student solved a problem, his or her score was calculated by the following equation:

Table 6 Programming contest problems. All problems are available from https: //github.com/AI-comp/Problems2017 (in Japanese)

Full size table

Score = the maximum score of the problem × ((remaining time /contest time) + 1) /2

The ranking order was determined according to the summation of the score. The contest score was not used as an explanatory variable for machine learning. The contest score is used only for sample ranking.

Algorithm selection

This paper used supervised learning algorithms for the classification problem. Six algorithms were tested to create a better model:

Support Vector Machine with RBF Kernel (SVM)
Support Vector Machine with Linear Kernel (SVML)
Logistic regression (LR)
Decision tree (DT)
Random forest (RF)
k-nearest neighbors (NN)

SVM is a method in which the boundary line is defined as the line that maximizes the sum of the margins up to the sample data closest to the boundary line when determining the boundary line to classify the data. It can be used not only for classification but also regression with an excellent recognition performance. In a two-choice prediction, LR is a logistic curve used to calculate the probability of becoming one sided with a value of 0 to 1. DC is a method to represent a branch process in a tree structure and the branching target data from the top to determine the final class. RF is a method to create multiple decision trees by randomly selecting data from the training data and determining the final class by majority voting of the results predicted by each decision tree. NN is a method to classify a class of multiple data nearest itself by a majority vote. These algorithms are very famous and popular in machine learning as Bishop (2006) summarized the principles, the good and bad hands of these algorithms.

Malss supports all of these algorithms. When the user passes data as parameters to malss, it tries these algorithms with cross-validation and parameter tuning using a grid search, and outputs a prediction model and a performance report with the F-measure. We used malss 1.1.2 with Anaconda 5.0.0 on Microsoft Visual Studio Community 2017 Version 15.5.3. We used values close to the default ones of malss as parameters. All details can be confirmed by referring to our published program.

To evaluate the prediction quality of the model, we implemented the stratified five-fold cross-validation. First, the validation divided the data set into five pieces so that each label had the same ratio. One piece was used for testing. The remaining four were used for learning. The cross-validation calculated the F-measure with precision and recall using each of the five divided data sets as test data.

There are lots of measurements to classify algorithms (e.g., accuracy, recall, precision, specificity, F-measure, AUC). The F-measure is a well-balanced measurement calculated from recall and precision. This paper used the F-measure as a classification measurement due to the calculation time and its popularity for classification problems. If the primary purpose is to detect failed students, it may be important to focus on other measurements such as specificity. For the ranking problem, we used Support Vector Machine for Ranking by SVM^rank. We also used the stratified five-fold cross-validation, which calculated the normalized Discounted Cumulated Gain (nDCG) for the ranking problem and verified the five divided datasets.

The nDCG was calculated by the following expression:

\({\mathrm {DCG=rel}}_{i}+\sum ^{k}_{i=2}\frac {\text {rel}_{i}}{\text {log}_{2}i}\), \(\text {nDCG}=\frac {\mathrm {DCG_{predict}}}{\mathrm {DCG_{ideal}}}\)

(rel_i : relevance of the ith element in the ranking, k : number of elements)

We used the training data as a test set (a closed test). Moreover, to reduce the deviation of the data, after dividing the data, the cross-validation process was repeated nine times. The median value was subsequently used.

We used svm_rank_learn and svm_rank_classify included by SVM^rank V1.00 on Microsoft Visual Studio Community 2017 Version 15.5.3. We created and evaluated models in brute force using the following parameter ranges, which seem to be sufficient:

Kernel: LINEAR and RBF
Rescaling method to use for loss: (1) slack rescaling and (2) margin rescaling
L-norm to use for slack variables: (1) L1-norm and (2) squared slacks
C: Trade-off between training error and margin: [1, 10, 100, 1000, 10000, 100000, 1000000]
Parameter gamma in the RBF kernel: [1, 10, 100, 1000, 10000, 100000, 1000000]

All details can be confirmed by looking at our published program.

Feature selection

In the psychological test, we converted the answers to the 31 questions into scores (1 to 7 points). Then, we calculated the sum of the scores by 15 subscales.

Next, we measured the metrics for all student-solved tasks. The scores ranked by the metric magnitude were used as explanatory variables for machine learning because the number of explanatory variables was enormous when each metric was used for each problem. Moreover, we added the total number of answers, the number of answers per difficulty level [Number of Solved Tasks (AOJ), and Difficulty Level 1 to 4 (AOJ)].

Finally, we tried to create a model that improved the evaluation score. We investigated the influence of each explanatory variable and removed ineffective variables to avoid a high variance. First, we used the explanatory variable with the best F-measure. Then, we added the explanatory variable with the next best F-measure. When there is more than one explanatory variable with the best F-measure, we randomly chose one and proceeded to the next step. This procedure was repeated until all variables were added like greedy algorithm. Finally, we regarded the model with the best F-measure in the procedure as the best model in our method.

Results and discussion

RQ1: how much does each explanatory variable predict the placement results?

Table 7 shows the classification results. As expected, the F-measure of Hope Class significant, suggesting that the teacher considers Hope Class in the placement, but it is not the sole factor. The explanatory variables of the measured metrics show high F-measures. In particular, Class Fan Out Complexity shows the highest F-measure. For the Psychological Scales, self-efficacy and interest also shows high F-measures, suggesting that these explanatory variables predict the placement results. However, the other F-measures in the Psychological Scales are not very good. For the task value, the interest values are high, but the others are low. Never-Give-Up Attitude shows the lowest F-measure. Questions about the class (Q1–10) show F-measures that are higher than those of Psychological Scales, but are lower than those of the measured metrics. From the Programming Tasks, using AOJ, Number of Solved Tasks (AOJ) and Difficulty Level 2 (AOJ) predict the placement result to some degree. However, Difficulty Levels 1, 3, and 4 (AOJ) show low F-measures.

Table 7 The best F-measure of each explanatory variable, algorithm with the best F-measure, nDCG of each explanatory variable, name of each explanatory variable, and meaning of each explanatory variable

Full size table

RQ2: how much does each explanatory variable predict the programming contest ranking?

Table 7 also shows the nDCG as the ranking results. The rankings show similar tendencies as the classification results. Questions about the class (Q1–10) and Psychological Scales are not very good. As expected, the number of answered AOJ questions seems to be related to the score because the problem of AOJ is similar to the problem presented in the programming contest. The explanatory variables of the measured metrics also show high nDCG. These results show that the score of the programming contest is not related to the Psychological Scales or class attitudes, but it is related to the quality of the written source code, which can be measured by the source code metrics.

Additionally, Table 8 shows the medians, variances, and p values of each explanatory variables. For example, in the first line, MEDIAN_A means the median value of answers for Q1 in the advanced class, MEDIAN_I means the median value of answers for Q1 in the intermediate class, and MEDIAN means the median value of answers for Q1 from all students. We used Wilcoxon’s signed rank test to calculate the p values, which represent the statistically significant difference between the intermediate class and the advanced class. According to this result, the advanced class students’ scores are better than the intermediate class students’ scores. For example, in the questionnaire about class attitudes (Q1–Q10), many students in the advanced class chose more options that mean “Agree” compared to intermediate class. About half of the explanatory variables of Psychological Scales and questionnaire about class attitudes show significant differences. In contrast, all metrics show significant differences. These results show that the advanced class students wrote higher quality codes (e.g., smaller LOC and lower complexity) than the intermediate class students. In particular, no intermediate class students solved the problem of Difficulty Levels 3 and 4 (AOJ).

Table 8 Median, variance, and p value (between the intermediate class and the advanced class) for each explanatory variable in the intermediate class, advanced class, and overall

Full size table

These results indicate that higher level students can be identified as they can solve such problems and should be into advanced class when combined with other explanatory variables. However, this explanatory variable alone cannot predict the placement result accurately.

RQ3: what is the best combination of the explanatory variables to predict the placement results?

We added explanatory variables one-by-one like a greedy algorithm. The best F-measure has a value of 0.912 (recall is 0.908, precision is 0.943, and specificity is 0.933) with DC using the following nine explanatory variables: (1) Q5 about the ease of understanding class materials, (2) Consistency of Interest, (3) Mastery Orientation, (4) Practical utility value, (5) Private Attainment Value, (6) Intrinsic Motivation, (7) Difficulty Level 3 (AOJ), (8) Difficulty Level 4 (AOJ), and (9) Class Fan Out Complexity. The F-measures of these explanatory variables are in bold in Table 7. Adding more explanatory variables actually decreases the F-measure. Table 9 shows the F-measure of each algorithm and the best model. These results show that DC is the best algorithm.

Table 9 F-measure of each algorithm for the best score (five-fold nested cross-validation)

Full size table

Figure 2 shows the learning curve of DC. Improvement in the cross-validation score accompanied by an increase in the learning data is not saturated (continues to improve), indicating a high variance (over-fitting). Thus, employing more training samples can reduce the effect of over-fitting, leading to improvements in the high variance estimator.

RQ1 implies that the results should contain many explanatory variables based on the measured metrics. However, we did not expect questions about class attitudes, Psychological Scales, and Difficulty Levels 3 and 4 (AOJ) to be included as explanatory variables because they showed low F-measures in the previous section. It is thought that these variables perform by combining with the former explanatory variables.

RQ4: what is the best combination of explanatory variables to predict the programming contest ranking?

Similar to RQ3, we added explanatory variables one-by-one like a greedy algorithm. The best nDCG has a value of 0.962 with DC using the following 20 explanatory variables: (1) Q3 about effort to understand the contents, (2) Q10 about whether the class is meaningful, (3) Consistency of Interest, (4) Performance Avoidance, (5) Performance Approach, (6) Avoidance of Competition, (7) Interest Value, (8) Institutional Utility Value, (9) Self-efficacy, (10) Intrinsic Motivation, (11) Total number of answered questions of AOJ, (12) Difficulty Level 1 (AOJ), (13) Difficulty Level 3 (AOJ), (14) Difficulty Level 4 (AOJ), (15) Class Data Abstraction Coupling, (16) Class Fan Out Complexity, (17) NCSS Class, (18) NCSS Method, (19) Npath Complexity, and (20) Too Many Methods. The nDCGs of these explanatory variables are in bold in Table 7. Adding more explanatory variables actually decreases the nDCG.

RQ2 implies that the results should contain many explanatory variables based on the measured metrics. However, some explanatory variables, which show a low nDCG in the previous section, are included as an element of this combination. It is thought that these variables perform by combining with the former explanatory variables. Therefore, Psychological Scales and the questionnaire on class attitude are also effective in combination with metrics.

In particular, the explanatory variables used for both models such as Consistency of Interest, Intrinsic Motivation, Difficulty Levels 3 and 4 (AOJ), and Class Fan Out Complexity, are considered to have strong relationships with the results.

Threats to validity

The questionnaires were conducted after the placement test. This could affect the results. Moreover, the best combination may be a local solution. These are threats to the internal validity.

These results are from one class. If this experiment is repeated with another group or organization, the results may differ. Furthermore, the amount of data is small. These are threats to the external validity.

Conclusion

Machine learning is used to predict both the placement results without a traditional placement examination and the programming skill level without a programming contest. The explanatory variables are Psychological Scales, Programming Tasks, and Student-answered Questionnaires. The target variable is the placement result based on an examination facilitated by a teacher. We investigated how the above three sets of explanatory variables affect the results. Finally, we created a classification model with a precision, recall, and F-measure of 0.912 and a ranking model with nDCG of 0.96172.

If teachers use our method, they can automate evaluations, which may reduce their workload, enhance the education quality, and positively impact students’ class attitude. These are the major contributions and implications of this paper.

However, this research has some limitations. Although our method should be applicable when using the same kinds of variables, its behavior when applying it to other datasets has yet to be confirmed. Our model exhibits a good performance. Because its recall and specificity are not 100%, how to use and operate this model in the field of actual education remains debatable. For example, we need to think about follow-up when the predictor mistakenly classifies a student. Additional improvements may be possible. For example, a superior algorithm compared to those in this study may exist. Regardless of these limitations, our method can be expanded to include other situations such as companies’ recruitment and placement.

The novelty of our method is that it adds the Psychological Scale to traditional evaluation criteria. Our study enables automatic placement based on a multifaceted evaluation using difficult-to-formulate information. This paper demonstrates the feasibility of evaluations using explanatory variables such as the Psychological Scale, which could not be previously employed in machine learning, and suggests that it may be possible to automate education evaluations. In the future, we plan to improve the prediction performance of our method by enhancing the algorithms and adding other explanatory variables.

Notes

This paper is an extended version of a paper “Student Placement Predictor for Programming. Class Using Classes Attitude, Psychological Scale, and Code Metrics.” presented at the 25th International Conference on Computers in Education (ICCE 2017). In the previous paper, we only predicted the student placement result. In this paper, we try to predict student skill ranking using programming contest, add new algorithm for classification. In summary, this paper demonstrates the applicability of our method to real programming class.

Abbreviations

GE:: General evaluation
DoC:: Degree of closeness
PSFs:: Psychosocial and study skill factors
GPA:: Grade point average
RNN:: Recurrent neural network
RQs:: Research questions
AOJ:: Aizu Online Judge
SVM:: Support Vector Machine with RBF Kernel
SVML:: Support Vector Machine with Linear Kernel
LR:: Logistic regression
DT:: Decision tree
RF:: Random forest
NN:: k-nearest neighbors
NDCG:: Normalized discounted cumulated gain

References

Ahadi, A, Lister, R, Haapala, H, Vihavainen, A (2015). Exploring machine learning methods to automatically identify students in need of assistance. In Proceedings of the Eleventh Annual International Conference on International Computing Education Research, ICER ’15. ACM, New York, (pp. 121–130).
Google Scholar
Bandura, A (1977). Self-efficacy: toward a unifying theory of behavioral change. Psychological review, 84(2), 191.
Article Google Scholar
Bishop, CM. (2006). Pattern Recognition and Machine Learning (Information Science and Statistics). Secaucus: Springer-Verlag New York, Inc.
Google Scholar
Castro-Wunsch, K, Ahadi, A, Petersen, A (2017). Evaluating neural networks as a method for identifying students in need of assistance. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education, SIGCSE ’17, Seattle, Washington. ACM, New York, (pp. 111–116).
Google Scholar
Deci, E, & Ryan, R. (1985). Intrinsic Motivation and Self-Determination in Human Behavior. Perspectives in Social Psychology. New York: Springer US.
Book Google Scholar
Deci, E, & Ryan, R. (2002). Handbook of Self-determination Research. Rochester: University of Rochester Press.
Google Scholar
Duckworth, A, & Gross, JJ (2014). Self-control and grit: related but separable determinants of success. Current Directions in Psychological Science, 23(5), 319–325.
Article Google Scholar
Duckworth, AL, Peterson, C, Matthews, MD, Kelly, DR (2007). Grit: perseverance and passion for long-term goals. Journal of personality and social psychology, 92(6), 1087.
Article Google Scholar
Duckworth, AL, & Quinn, PD (2009). Development and validation of the short grit scale (grit–s). Journal of personality assessment, 91(2), 166–174.
Article Google Scholar
Eccles, J, & Wigfield, A (1985). Teacher expectancies and student motivation. In: Dusek, JB (Ed.) In Teacher expectancies. Lawrence Erlbaum Associates, Hillsdale.
Google Scholar
Elliot, AJ, & Church, MA (1997). A hierarchical model of approach and avoidance achievement motivation. Journal of personality and social psychology, 72(1), 218.
Article Google Scholar
Hong, JK, Mitrovic, A, Neshatian, K (2015). Predicting quitting behavior in SQL -Tutor. In Proceedings of the 23th International Conference on Computers in Education (ICCE 2016). Hangzhou. APSCE, Taoyuan.
Google Scholar
Ida, K (2001). An attempt to construct the academic task values evaluation scale. Bulletin of the Graduate School of Education and Human Development. Psychology and human developmental sciences, 48, 83–95.
Google Scholar
Ishizue, R, Sakamoto, K, Washizaki, H, Fukazawa, Y (2017). Student placement predictor forprogramming class using classes attitude, psychological scale, and code metrics. In Proceedings of the 25th International Conference on Computers in Education (ICCE 2017), Taoyuan County. APSCE, Taiwan.
Google Scholar
Ishizue, R, Sakamoto, K, Washizaki, H, Fukazawa, Y (2017). An interactive Web Application Visualizing Memory Space for Novice C Programmers (abstract only). In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education, SIGCSE ’17. ACM, New York, (pp. 710–710).
Chapter Google Scholar
Ishizue, R, Sakamoto, K, Washizaki, H, Fukazawa, Y (2018). PVC: Visualizing C Programs on Web Browsers for Novices. In Proceedings of the 2018 ACM SIGCSE Technical Symposium on Computer Science Education, SIGCSE ’18. Baltimore. ACM, New York.
Google Scholar
Jamison, J (2017). Applying machine learning to predict Davidson College’s admissions yield. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education, SIGCSE ’17. Seattle. ACM, New York, (pp. 765–766).
Google Scholar
Joachims, T (2002). Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, (pp. 133–142).
Chapter Google Scholar
Joachims, T (2006). Training linear SVMs in linear time. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, (pp. 217–226).
Chapter Google Scholar
Kamoshida, R, & Sakamoto, K (2016). Malss: a tool to support data analysis using machine learning for novices. The Institute of Electronics, Information and Communication Engineers, J99-D, 428–438.
Google Scholar
Kotsiantis, SB, Pierrakeas, CJ, Pintelas, PE (2003). Preventing student dropout in distance learning using machine learning techniques. In: Palade, V, Howlett, RJ, Jain, L (Eds.) In Knowledge-Based Intelligent Information and Engineering Systems. Springer Berlin Heidelberg, Berlin, (pp. 267–274).
Chapter Google Scholar
Li, J, Sakamoto, K, Washizaki, H, Fukazawa, Y (2017). Promotion of educational effectiveness by translation-based programming language learning using java and swift. In Proceedings of the 50th Annual Hawaii International Conference on System Sciences (HICSS-50), Waikoloa, Hawaii, Jan 4-7. AIS Electronic Library (AISeL), Atlanta.
Google Scholar
Márquez-Vera, C, Cano, A, Romero, C, Noaman, AYM, Mousa Fardoun, H, Ventura, S (2016). Early dropout prediction using data mining: a case study with high school students. Expert Systems, 33(1), 107–124.
Article Google Scholar
McCracken, M, Almstrum, V, Diaz, D, Guzdial, M, Hagan, D, Kolikant, YB-D, Laxer, C, Thomas, L, Utting, I, Wilusz, T (2001). A multi-national, multi-institutional study of assessment of programming skills of first-year CS students. In Working Group Reports from ITiCSE on Innovation and Technology in Computer Science Education, ITiCSE-WGR ’01, Canterbury, UK. ACM, New York, (pp. 125–180).
Chapter Google Scholar
Nishikawa, K, Okugami, S, Amemiya, T (2015). Development of the japanese Short Grit Scale (Grit-s). Japan Society of Personality Psychology, 24(2), 167–169.
Google Scholar
Okubo, F, Yamashita, T, Shimada, A, Konomi, S (2017). Students’ performance prediction using data of multiple courses by recurrent neural network. In Proceedings of the 25th International Conference on Computers in Education (ICCE 2017), Rydges Latimer hotel, Christchurch, New Zealand, December 4-8. APSCE, Taoyuan.
Google Scholar
Ota, N (2010). Construction of a multi-dimensional competitiveness scale. Journal of College of Contemporary Education, 2, 57–65.
Google Scholar
Robbins, SB, Lauver, K, Le, H, Davis, D, Langley, R, Carlstrom, A (2004). Do psychosocial and study skill factors predict college outcomes? A meta-analysis. Psychological Bulletin, 130(2), 261–288. Washington, DC.: American Psychological Association.
Article Google Scholar
Ryckman, RM, Hammer, M, Kaczor, LM, Gold, JA (1990). Construction of a hypercompetitive attitude scale. Journal of Personality Assessment, 55(3-4), 630–639.
Article Google Scholar
Ryckman, RM, Hammer, M, Kaczor, LM, Gold, JA (1996). Construction of a personal development competitive attitude scale. Journal of personality assessment, 66(2), 374–385.
Article Google Scholar
Saito, D, Washizaki, H, Fukazawa, Y (2017). Comparison of text-based and visual-based programming input methods for first-time learners. Journal of Information Technology Education-Research, 16, 209–226.
Article Google Scholar
Shen, B, Chen, A, Guan, J (2007). Using achievement goals and interest to predict learning in physical education. The Journal of Experimental Education, 75(2), 89–108.
Article Google Scholar
Sherer, M, Maddux, JE, Mercandante, B, Prentice-Dunn, S, Jacobs, B, Rogers, RW (1982). The self-efficacy scale: construction and validation. Psychological reports, 51(2), 663–671.
Article Google Scholar
Smither, R, & Houston, J (1992). The nature of competitiveness: the development and validation of the competitiveness index. Educational and Psychological Measurement - EDUC PSYCHOL MEAS, 52, 407–418.
Article Google Scholar
Sohsah, GN, Guzey, O, Tarmanini, Z (2016). Classifying educational lectures in low-resource languages. In Proceedings of the 15th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, New York, (pp. 431–435).
Google Scholar
Tanaka, A, & Yamauchi, H (2000). Causal models of achievement motive, goal orientation, intrinsic interest, and academic achievement in classroom. Japanese Psychological Research, 71(4), 317–324.
Google Scholar
Trotman, A, & Handley, C (2008). Programming contest strategy. Computers & Education, 50(3), 821–837.
Article Google Scholar
Verdú, E, Regueras, LM, Verdú, MJ, Leal, JP, de Castro, JP, Queirós, R (2012). A distributed system for learning programming on-line. Computers & Education, 58(1), 1–10.
Article Google Scholar
Yasuda, K, Kawashima, H, Kimura, H (2016). Automatic scoring of english speaking test using automatic speech recognition. In Proceedings of the 24th International Conference on Computers in Education (ICCE 2016), IIT Bombay, Mumbai India. APSCE, Taoyuan.
Google Scholar
Zhao, J, Xie, X, Xu, X, Sun, S (2017). Multi-view learning overview: recent progress and new challenges. Information Fusion, 38, 43–54.
Article Google Scholar

Download references

Acknowledgements

I would like to thank Taisuke Yokoi, Remin Kasahara, and the other members of the Washizaki laboratory for their help with my research.

Funding

This work was partially supported by JST Presto grant number JPMJPR14D4.

Availability of data and materials

The datasets generated during and/or analyzed during the current study are not publicly available because we do not have permission of publication but are available from the corresponding author on reasonable request.

Author information

Authors and Affiliations

Department of Science and Engineering, Waseda University, Tokyo, Japan
Ryosuke Ishizue, Hironori Washizaki & Yoshiaki Fukazawa
National Institute of Informatics/JST PRESTO, Tokyo, Japan
Kazunori Sakamoto

Authors

Ryosuke Ishizue
View author publications
You can also search for this author in PubMed Google Scholar
Kazunori Sakamoto
View author publications
You can also search for this author in PubMed Google Scholar
Hironori Washizaki
View author publications
You can also search for this author in PubMed Google Scholar
Yoshiaki Fukazawa
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

RI and KS made substantial contributions to conception and design of this research. Especially, RI mainly wrote and revised the manuscript and acquisition and analysis of data. HW and YF mainly contributed to the positioning of this research among related research, the comparison between this method and other methods. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Ryosuke Ishizue.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Ishizue, R., Sakamoto, K., Washizaki, H. et al. Student placement and skill ranking predictors for programming classes using class attitude, psychological scales, and code metrics. RPTEL 13, 7 (2018). https://doi.org/10.1186/s41039-018-0075-y

Download citation

Received: 05 January 2018
Accepted: 14 June 2018
Published: 28 June 2018
DOI: https://doi.org/10.1186/s41039-018-0075-y

Student placement and skill ranking predictors for programming classes using class attitude, psychological scales, and code metrics

Abstract

Introduction

Related work

How are students’ programming skills traditionally assessed?

Can additional variables be used to predict programming skills?

How is machine learning previously used in relevant areas?

Method

Participants

Explanatory variables

1. Psychological Scales

2. Programming task

3. Questionnaire about the class

Objective variables

Placement results

Ranking results of the programming contest

Algorithm selection

Feature selection

Results and discussion

RQ1: how much does each explanatory variable predict the placement results?

RQ2: how much does each explanatory variable predict the programming contest ranking?

RQ3: what is the best combination of the explanatory variables to predict the placement results?

RQ4: what is the best combination of explanatory variables to predict the programming contest ranking?

Threats to validity

Conclusion

Notes

Abbreviations

References

Acknowledgements

Funding

Availability of data and materials

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords