Skip to main content

The incubation effect among students playing an educational game for physics


The incubation effect (IE) is a problem-solving phenomenon composed of three phases: pre-incubation where one fails to solve a problem; incubation, a momentary break where time is spent away from the unsolved problem; and post-incubation where the unsolved problem is revisited and solved. Literature on IE was limited to experiments involving traditional classroom activities. This initial investigation showed evidence of IE instances in a computer-based learning environment. This paper consolidates the studies on IE among students playing an educational game called Physics Playground and presents further analysis to examine the incidence of post-incubation or the revisit to a previously unsolved problem. Prior work, which focused on predicting successful outcomes, includes a coarse-grained IE model developed with logistic regression on aggregated data and an improved model which leveraged long short-term memory (LSTM) combined with dimensionality reduction visualization technique and clustering on fine-grained data. The additional analysis which aims to understand factors that may trigger the post-incubation phase also used fine-grained data and LSTM to create a revisit model. Results show that time elapsed relative to the activity period and encountering a problem with a similar solution during incubation were possible factors in revisiting previously unsolved problems.


Studies show that taking a break when one is stuck in a problem-solving activity may facilitate the solution process (Fulgosi & Guilford, 1968; Gilhooly, Georgiou, and Devery 2013; Penaloza & Calvillo, 2012; Sio & Ormerod, 2015). This momentary break, called incubation (Sio & Ormerod, 2015), may trigger an internal mental process which associates new information with past information to generate solution ideas (Medd & Houtz, 2002). In the context of education, students who reach an impasse in a problem-solving activity may temporarily engage in another task, after which they return to the original problem and find a solution. When the student solves the problem after incubation, the phenomenon, and its successful result, is called the incubation effect (IE).

IE is divided into three phases (Gilhooly, Georgiou, & Devery, 2013): (a) pre-incubation phase, (b) incubation phase, and (c) post-incubation phase. The pre-incubation phase consists of the failed attempts to solve a problem where one usually gets stuck. The incubation phase begins when the learner decides to take a break from the problem and engages in either a similar task, a different task, or just rest. The post-incubation phase starts when the learner goes back to the original problem and tries to solve it again. The benefits of incubation prompted researchers to incorporate breaks into educational activities which have shown to have positive results (Lynch & Swink, 1967; Medd & Houtz, 2002; Rae, 1997; Webster, Campbell, & Jane, 2006). Earlier work on IE (Ellwood, Pallier, Snyder, & Gallate, 2009; Fulgosi & Guilford, 1968; Gilhooly et al., 2013; Penaloza & Calvillo, 2012; Sio & Ormerod, 2015) investigated specific factors that could lead to successful incubation in the context of classroom tasks and suggested that engaging in a different activity may produce a better outcome. On the other hand, Penney, Godsell, Scott, and Balsom (2004) claimed that engaging in a task with similar nature would promote priming which allows students to realize the correct solution to the problem but Segal (2004) said that the task during incubation has no effect on its outcome.

The incidence of the incubation effect has also been investigated in the context of a computer-based environment called Physics Playground (PP) which is a two-dimensional game that is designed for high school students to better understand concepts in Physics. Initial work (Martinez, Obispo, Talandron, & Rodrigo, 2016) found evidence that taking a break helped some students to solve a problem in which they were previously stuck. Talandron, Rodrigo, and Beck (2017) attempted to model IE and examined possible factors that predict the successful outcome of incubation where the model was able to predict IE but also had the tendency to predict false positive IEs. To further explore IE on PP, analysis was conducted to improve the detection of unsuccessful incubation (Talandron & Rodrigo, 2018). Both studies were limited to hand-crafted features from aggregated data which means that individual attempts comprising the 3 phases of IE were not analyzed. In Talandron-Felipe and Rodrigo (2019), a fine-grained level analysis was conducted and the actual activity during incubation were taken into consideration. The common focus of these studies was to model the result of the post-incubation phase and determine the features that contribute to a successful incubation. However, it is also important to understand the incidence of revisit, which is when the student decides to return to a previously unsolved problem after taking a break and playing other levels.

Aside from consolidating the findings of previous work on IE in Physics Playground, this paper aims to present additional analysis on what triggers the post-incubation phase of IE by determining features that predict the incidence of revisit to a previously unsolved problem. To realize the objective, this research answers the question: what features impact post-incubation?

The rest of the paper is organized into three sections. The literature review discusses the different theories surrounding IE. The second section, Incubation Effect in Computer-Based Learning Environment, presents previously published works on IE in the context of Physics Playground. The last section consists of the current work’s methods, results, and discussion. The paper is concluded with a summary and the limitations of the study as well as future work recommendations.

Literature review

The term “incubation” was first mentioned in Graham Wallas’ model of the creative process in 1926. It was defined as a period of unconscious processing, during which no conscious effort is exerted upon the unsolved problem. As mentioned, the positive outcome of the incubation period is called the incubation effect (IE). Based on this, Gilhooly et al. (2013) came up with the three phases of IE: pre-incubation, incubation, and post-incubation. Mapping these phases to Wallas’ model, the pre-incubation phase is equivalent to the preparation stage where one attempts to solve the problem and entails part research, part planning, part entering the right frame of mind and attention. The second phase is similar to Wallas (1926) incubation stage. They both described this as the period of abstention from the unsolved problem and may be spent either in conscious mental work on another problem or relaxation. Wallas (1926) suggested that spending time on another problem economizes time and is often the better option. In Wallas’ model, incubation is followed by a brief period called illumination when ideas for a solution arise followed by the verification stage where one carries out the solution which is called the post-incubation phase in Gilhooly et al. (2013).

Following Wallas, there have been experiments using different types of problems with the goal of understanding the underlying factors that affect the outcome of incubation. These studies and theories of the incubation effect are discussed in the succeeding subsections. Different studies examine many factors of the incubation effect such as activity before incubation, the type of problem employed, clues during incubation, length of the incubation period, and type of activities done during the incubation period (Dodds, Ward, & Smith, 2004). Gilhooly et al. (2013) outlined four main approaches in understanding what happens during the incubation period.

Intermittent conscious work

This theory suggests that even though incubation is supposed to be a period without conscious work on the unsolved problem, the solver may still carry out intermittent conscious work (Mayer et al., 1995; Weisberg, 2006). The student leaves the problem and engages in another task but relates the present task to the unsolved problem and while solving the problem at hand consciously thinks about how it could help the previously unsolved problem. It was hypothesized in Dodds et al. (2004) that doing this could lead to a deficit in the performance on the present task but it was not confirmed nor disproven until Gilhooly, Georgiou, Garrison, Reston, and Sirota (2012) dismissed the hypothesis and showed that doing intermittent conscious work on the unsolved problem had no negative effects on the activities during incubation.

Unconscious work

The theory of doing unconscious work during incubation was first used in the context of problem-solving in Poincaré (1913) and was later on referred to as non-conscious idea generation (Snyder, Mitchell, Ellwood, Yates, & Pallier, 2004) or unconscious thought (Dijksterhuis & Nordgren, 2006). Poincaré (1913) explained that effort exerted on failed attempts on a problem might be utilized if conscious work is interrupted and rest is given to the mind. During this rest, the brain carries out unconscious work and that the result of this work will afterward reveal itself. Another remark made about unconscious work is that it is only beneficial if it is preceded and then followed by a period of conscious work. The prior conscious work provides the inspiration and information which is used unconsciously during incubation. The benefits of this unconscious idea generation can then manifest only if it is followed by conscious work on the problem.

Beneficial forgetting

Smith and Blankenship (1991) reported that fixation, where one sticks to a specific solution, may block successful problem-solving and may develop during the initial solution attempts. In this approach, it was proposed that incubation plays an important role in creating distractions from the fixation. During incubation, wrong assumptions and strategies that were fixed in the mind of the solver should be weakened through forgetting and thus a fresh start happens when the solver resumes the unsolved problem (Gilhooly et al., 2012).

In the study of Gilhooly et al. (2013), they explored the beneficial-forgetting and unconscious-work theories and the effect of differential fatigue relief—where doing a different activity may relieve the solver from being tired. In their experiment, the task was to find new ways to use a familiar object. The result suggested that it was helpful for respondents to put aside the task immediately after failing and return to it after a certain period, allowing unconscious incubation processes to operate during the break, before going back to the problem and do conscious effort.

Attention withdrawal

In this theory, Segal (2004) emphasized that the only function of incubation is to divert the solver’s attention from the unsolved problem, thus releasing the mind from further making false assumptions and avoiding developing the wrong fixation. Total withdrawal of attention enables the solver to apply a new assumption to the problem after taking the break. This approach supports the beneficial forgetting theory but is in contrast to the intermittent conscious and the unconscious work theories.

Incubation effect in a computer-based learning environment

All of the prior work in incubation effect involved experiments conducted in a classroom or laboratory setting. However, in recent years, the use of computer-based learning environments to foster learning has been increasing (Azevedo, 2005; Corte, Erik, Mandl, & Verschaffel, 2013; Polson & Richardson, 2013). One of these platforms is a game-based learning environment (GBLE) which has both gaming and learning outcomes embedded in the system (Royle, 2008). These kinds of environments allow for a stealth assessment on students’ learning behaviors (Shute & Ventura, 2013).

Investigating IE in Physics Playground

Physics Playground (PP), formerly known as Newton’s Playground, is a two-dimensional computer-based game designed for students in the secondary level to better understand the concepts of qualitative Physics. The game simulates how the physical objects operate in relation to Newton’s laws of motion: balance, mass, conservation and transfer of momentum, gravity, and potential and kinetic energy (Shute & Ventura, 2013). The game has different problems with varying levels of difficulty and solutions. The main objective of each problem is to guide a green ball to a red balloon. To solve each level, the players must draw objects (i.e., ramp, lever, pendulum, springboard) using the computer mouse and these objects become part of the game environment. Figure 1 a shows an example level of PP which requires a ramp to lead the ball to the balloon. All objects drawn obey the basic rules of physics relating to gravity and Newton’s three laws of motion (Shute & Ventura, 2013). Once the player draws a ramp, the ball will then follow its path until it reaches the red balloon as shown in Fig. 1b.

Fig. 1
figure 1

A sample level in Physics Playground. (a) A level that requires a ramp. (b) Sample solution to a ramp problem

By drawing these simple machines, students are expected to gain an understanding of how these objects adhere to the laws of physics as representing agents of force and motion. When the students solve a level, they receive either a gold or silver badge. A badge is awarded if the student solves the level—a gold badge (Fig. 2a) if the problem was solved using at or below a par number of objects determined by the game designers; otherwise, a silver badge is given Fig. 2b.

Fig. 2
figure 2

An example of the gold and silver badges as award for solving the level. (a) A gold badge was awarded. (b) A silver badge was awarded

There are levels in PP which were designed with only one ideal solution while others have 2 to 3 possible solutions. These types of problems are known as insight problems and divergent problems, respectively. Insight problems are those with a single solution but the solver has to develop a new way of representing the task in order to reach that solution while divergent problems are those with more than one solution but requires creativity in order to arrive at one of them (K. J. Gilhooly et al., 2012). Most experiments studying the incubation effect were limited to insight problems. It is also important to note that even though these are called levels, the problems were not necessarily arranged based on difficulty level. It also has an open-access feature where students are allowed to choose any level at any time.

Initial work (Martinez et al., 2016; Talandron et al., 2017) on IE in PP used interaction logs of 60 eighth grade or 2nd year high school students from Baguio City, Philippines, who played the game for around 2 h. The interactions of each player with PP were tracked and automatically logged into a file. In order to map students’ actions with the 3 phases of IE, the following events were examined:

  • Level Start. Player starts a level attempt.

  • Level Restart. Player resets the level to start another attempt.

  • Level End. Player completes a level and PP gives out a badge for the specific agent used.

  • Badge. A visual representation (i.e. gold or silver) awarded due to a player for completing an event.

  • Menu Focus. Player returns to the main menu.

The 3 IE phases were then operationalized in the context of PP as follows:

  • Pre-incubation Phase—The player attempts a level, X, indicated by the event “Level Start” but the player fails and decides to leave the level as indicated by the event Level End—Badge: “None.”

  • Incubation Phase—After leaving level X, the player takes a break and returns to the menu as indicated by the event Menu Focus or plays a different level as indicated by the event Level Start or watches the tutorial.

  • Post-incubation Phase—The player returns to level X and attempts to solve it again. This is indicated by the event Level Start.

When these 3 phases were present for a certain level, the occurrence was labeled as Potential IE. If the player earned a badge, whether gold or silver, during the post-incubation phase, the attempt was considered IE-True. However, if no badge was awarded at the end of the post-incubation phase, it was labeled as IE-False. The presence of the 3 phases in a level was counted as one Potential IE only irrespective of the number of breaks and revisits to level X. That means there can only be 1 Potential IE per level per player as shown in Fig. 3.

Fig. 3
figure 3

IE Identification (Martinez et al., 2016)

Martinez et al. (2016) found evidence that students’ IE success rates matched their non-IE success rates, implying that IEs may indeed help students who are stuck. Among the 60 players in the study, 37 exhibited potential IE and had an average IE success rate of 75% while those attempts that had no incubation had an average success rate of 66%. It was also reported that frustration was associated with the occurrence of potential IEs and that length of incubation was not associated with IE-true.

Modeling IE on a coarse-grained level

A coarse-grained analysis (Talandron et al., 2017) established the baseline of the possible factors that predict the incubation effect. The identification of IE was adopted from Martinez et al. (2016) with a modification of considering multiple breaks leading to multiple pairs of pre-incubation and post-incubation phases as separate potential IE as shown in Fig. 4. The nth triangular number formula [n(n+1)]/2 was how the total number of pairs was computed where n is the number of revisits. The highest number of revisits was 6 which resulted in a total of 21 pairs. This resulted in 6 counts of IE-True and 15 counts of IE-False which means IE-False instances were counted thrice as much as IE-True. This was just one instance and admittedly was not excluded from the analysis. But just to further show how this counting could have affected the analysis: 5 revisits would result in 15 pairs, 10 IE-False, and 5 IE-True; 4 revisits mean 10 pairs of 6 IE-False and 4 IE-True; 3 revisits would result in 6 pairs with 3 IE-False and 3-IE True. As the number of revisits lessens, the bias diminishes. At this point, it is important to note that the average number of revisits is 1.5 with a standard deviation of 1.2. Also, this IE identification method allows for the analysis of the varying length of incubation at different points of revisits for the same problem as well as the analysis of the impact of encountering the same problem during incubation for the outer pairs.

Fig. 4
figure 4

Modified IE identification (Talandron et al., 2017)

The coarse-grained analysis produced features of IE that match with the theories from prior work (Talandron et al., 2017) such as productivity or success rate of students (more badges earned), problem difficulty (at least 65% of attempts on the problem was successful), total problems attempted to solve (at most 30), time period of revisit (within the first half of the session). It showed that a learner’s problem-solving ability represented by badges earned prior to the post-incubation phase and the problem’s level of difficulty were both factors in the incidence of IE-True which are consistent with findings of Sio and Ormerod (2009) and Smith and Blankenship (1991) in the earlier literature on IE. It was also found that both the incidence of incubation in the later parts of a problem-solving session and the number of attempts made prior to post-incubation have a negative relationship with IE-True. Even though these features have been identified and provided insights, the manually engineered features were based on aggregated data which could have been the cause for the coarse-grained model’s relatively low performance (recall = 89.61%, precision = 50.73%, f1-score = 64.79%, and kappa = 0.22).

Due to the difficulty in accurately predicting IE-True, focus was shifted instead on investigating IE-False instances in an attempt to understand unproductive incubation as this information could help educators identify what factors in the context of IE should be avoided. Talandron and Rodrigo (2018) used a combination of t-SNE dimensionality reduction and x-means clustering techniques and found 3 features that led to failure during revisit: 1) a very lengthy incubation duration which meant spending more than 40 min away from the unsolved problem in a 2-h session, 2) lower success rate prior to revisit which meant solving less than 26% of the problems, and 3) doing more than 2 attempts on the problem during revisit.

Modeling IE on a fine-grained level

Since the previous model was limited to hand-crafted features from aggregated data and individual attempts comprising the 3 phases of IE were not analyzed at a fine-grained level and the actual activity during incubation were not taken into consideration, further analysis (Talandron-Felipe & Rodrigo, 2019) was conducted on a fine-grained level. The study used a larger dataset collected from a total of 176 public and private high school students in the Philippines. The data was structured for the specification of the timestep and batch size such that the timestep corresponds to the number of actions to solve a problem and the batch size as the number of problems or attempts per student. The model was developed using Keras, a high-level neural networks API on top of TensorFlow with Python as the underlying programming language. The input features were the time of the action, the problem ID, and the series of actions in the attempts to solve the problem. The output label was either IE-True (revisited previously unsolved problem then solved it), IE-False (revisited previously unsolved problem but still unable to solve it), and others—it could be a new problem (i.e., the player has not encountered the problem level before), or replay (i.e., the player goes back to a previously solved problem).

The model performed better (recall = 91.62%, precision = 82.55%, f1-score = 86.84%, kappa = 0.821) than that of the coarse-grained model. Due to the abstract nature of the resulting model using a deep learning approach, Talandron-Felipe and Rodrigo (2019) used t-SNE and X-mean clustering on the features derived from the input data. The actual IE-True and IE-False instances and the model’s predictions were separately plotted on a two-dimensional graph using t-SNE (see Fig. 5). The model’s performance is also reflected in the similarity between the two graphs and clusters are also apparent.

Fig. 5
figure 5

t-SNE plot of IE instances (Talandron-Felipe & Rodrigo, 2019). (a) Actual IEs. (b) Predicted IEs

X-means was used on the t-SNE results to further distinguish IE-True and IE-False (see Fig. 6). Referencing Fig. 5 a and b on Fig. 6 a and b, cluster 1 is predominantly composed of IE-True. A quantitative analysis was done for all the features derived from the LSTM input data in order to extract distinct features for cluster 1 of the prediction results.

Fig. 6
figure 6

The clustering results of both actual IEs and predicted IEs (Talandron-Felipe & Rodrigo, 2019). (a) Clusters from actual IEs. (b) Clusters from prediction results

The study found five features associated with the incidence of IE-True:

  1. 1)

    Time of revisit relative to the session period—t-test showed that there was a significant effect of the time of revisit on the clusters at the p < 0.05 level [F (1, 321) = 7.01, p = 0.008] which indicates that incubation in the early part of a time-limited session is more likely to be beneficial;

  2. 2)

    Duration of incubation (within the session period)—a significant difference was found between cluster 1 (mean = 10.95, sd = 17.99) and cluster 2 (mean = 16.06, sd = 20.81) at the p < 0.05 level [F (1, 321) = 5.52, p = 0.02], a consistent finding with previous work where a prolonged break could lead to IE-False (Talandron et al., 2017) and specifically, incubation duration more than 40 min in a 2-h session resulted in IE-False (Talandron & Rodrigo, 2018)

  3. 3)

    Problem difficulty—the difference was significant between cluster 1 (mean = 39.36%, sd = 16.64%) and cluster 2 (mean = 45.39%, sd = 17.33%) at the p < 0.05 level [F (1, 321) = 10.06, p = 0.002], an intuitive feature similar to the finding of the coarse-grained model that revisiting a problem with lower difficulty rate more likely results to IE-True.

  4. 4)

    Student’s productivity at the time of revisit—this refers to the number of problems solved over all attempts made at the time of revisit and when compared, there was a significant difference between cluster 1 (mean = 64.48%, sd = 17.89%) and cluster 2 (mean = 56.97%, sd = 19.75%) at the p < 0.05 level [F (1, 321) = 12.30, p < 0.001]. When IE instances were compared at every 25% of productivity intervals, it was found that IE-True are more likely to occur if productivity is more than 50% at the time of post-incubation.

  5. 5)

    Similarity with the preceding problem—chi-square test of independence result showed a significant effect, c2 (1, N = 323) = 39.55, p < .001, between the clusters and the problem type such that preceded by a problem with a similar solution was considered a significant factor in predicting IE-True.

In terms of game actions, cluster 1 had significantly higher incidence of erase (cluster 1 mean = 10.89, cluster 2 mean = 6.55 at the p < 0.05 level [F (1, 321) = 7.77, p < 0.01]) as an indication of better awareness when incorrect drawings were made and hover tutorial (cluster 1 mean = 0.41, cluster 2 mean = 0.11 at the p < 0.05 level [F (1, 321) = 4.68, p < 0.05]) which means they are making sure that they are drawing the object correctly and lower incidence of pause (cluster 1 mean = 0.39, cluster 2 mean = 1.21 at the p < 0.05 level [F (1, 321) = 143.42, p < 0.01]) as an indication that they were more confident of what they are doing .

Within cluster 1, these actions were further analyzed if there was any difference during pre-incubation and post-incubation. The change was indeed significant for erase, (cluster 1 pre-incubation mean = 0.03, cluster 1 post-incubation mean=0.05 at the p < 0.05 level [F (1, 381) = 6.72, p = 0.009]) and for pause (cluster 1 pre-incubation mean = 0.08, cluster 1 post-incubation mean = 0.03 at the p < 0.05 level [F (1, 381) = 109.19, p < 0.001]).

Similarly, an improvement was also reported in cluster 1 in terms of the students drawing of ramp (pre-incubation mean = 0.02, post-incubation mean = 0.07, at the p < 0.05 level [F (1, 381) = 14.72, p < 0.001]) and springboard (pre-incubation mean = 0.007, post-incubation mean = 0.025, at the p < 0.05 level [F (1, 381) = 7.87, p = 0.005]).

Understanding the post-incubation phase

This section presents the additional analysis focusing on the occurrence of the post-incubation phase. It aims to determine the factors that triggered the students to revisit previously unsolved problems after spending some time playing with other levels. Having explored the features that may predict IE-True, it is equally important to understand further the underlying factors that led to the post-incubation phase.


The additional analysis used the same dataset in Talandron-Felipe and Rodrigo (2019). It was composed of the interaction logs of 29 students from a public junior high school in Baguio City (School 1); 31 students from a private university also in Baguio City (School 2); 56 students from a private university in Cebu (School 3); and 60 from a private university in Davao City (School 4). The distribution of participants in terms of gender and age is shown in Table 1. These students were considered average in terms of their academic performance.

Table 1 Distribution of participants in terms of age and gender

The students were given an orientation to introduce them to Physics Playground and to explain the game mechanics. Before playing, they were asked to answer a pre-test which was comprised of 16 multiple-choice type questions worth 1 point each about simple machines and laws of Physics in relation to the learning objectives of PP. Their pre-test scores showed that the students were homogenous in their prior knowledge (mean = 6.84, median = 7.00, mode = 7.00 sd = 1.99).

The students were then assigned to computers within a computer lab and given about 2 h to play Physics Playground. As they were playing, their interactions within the game were automatically recorded into a log file. The researchers were present in the lab while the students were playing, but they did not prescribe which problems the students had to solve. Students were free to choose the problems they attempted to solve. They were free to leave the problem and return to it at a later time. They solved as many problems as they could within the given time. The session was not designed to investigate or gather data for a specific construct and so they were not given any specific strategies. The method of data gathering was designed for a stealth observation of students’ actions through the interaction logs. After the session, the students answered a post-test which was also based on the topics covered in PP.

While playing PP, student’s interactions with the game were automatically recorded along with each action’s time stamp. These include the level, start time, end time, objects drawn, badge, etc. where other information can be derived as in prior work such as attempt duration, number of restarts and revisits, sequence of levels, number of badges earned.

The actions recorded were divided into 4 categories: Menu Events, Level Events, Play Events, and Agent Events. Menu Events refer to interactions when the player is in the main menu of the game while Level Events are actions related to each individual level within a playground. Play Events are the player’s interactions within the PP environment once the player started to play a specific level. Agent Events refer to the interactions of and with the objects or simple machines drawn by the player to solve the level. Table 2 shows the different events recorded for each category and their description.

Table 2 Description of events generated in PP

Operationalizing IE in the PP interaction logs was done in a hierarchical manner similar to the fine-grained analysis in Talandron-Felipe and Rodrigo (2019). Figure 7 shows the levels of analysis as well as the relationship of the entities. The coarse-grained level analysis involved features from levels 1 to 3 of the diagram and the fine-grained analysis will include features on levels 4 and 5.

Fig. 7
figure 7

Levels of analysis (Talandron-Felipe & Rodrigo, 2019)

For the task of investigating the incidence of the post-incubation phase, the input data included the problem ID, series of actions taken to solve the problem, the result which indicates whether the student solved the problem or not, and the canonical solution to each problem which was the basis of the problem type was also integrated into the logs. To prepare the data for modeling using LSTM, it is essential to structure the data for the specification of the timestep and batch size such that the timestep corresponds to the number of actions to solve a problem and the batch size as the number of problems or attempts per student. Since the number of actions for all attempts was not the same, the data has been padded with 0’s in order to make them uniform. A similar technique has been done in terms of the number of problems per student and this number was used as the batch size. The input vector was composed of 8500 rows (100 attempts each for 85 students). The data was then structured into a supervised multiclass classification. The input features include the time the attempt started (t), the current problem ID at time t (pt), the type of canonical solution for the specific problem (spt), and the result of the current problem (rpt).

A further step was the transformation of the data type from string to numeric. Sklearn's LabelEncoder module from the Scikit-learn library finds all classes and assigns each a numeric id starting from 0. For the output labels, np.utils.to_categorical was used to convert the array of labeled data (from 0 to nb_classes-1) to one-hot vector.

The model using LSTM was developed using Keras, a high-level neural networks API on top of TensorFlow with Python as the underlying programming language. To realize the objectives, this study focused on the given task:

  • Given a sequence of a student’s previous attempts on different problems from the start of the session to current time, predict whether the problem at time t+1 will be on a new problem, a replay or a revisit based on the following definitions:

    • New—the attempt is considered ‘new’ if the problem at time t+1 is not equivalent to any problem in the sequence of previous attempts (i.e. the player has not previously attempted the problem since the start of the session)

    • Replay—it means the problem at time t+1 is a previously solved problem OR that the problem at time t+1 is just the same as the problem at time t which means incubation or break did not occur

    • Revisit—it means that the problem at time t+1 is not the same as the problem at time t AND that the problem has been previously attempted but not solved

Other specifications for the LSTM neural network are the following:

  • In the initial test, the network has one layer. Since the analysis was experimental in nature, considering that standards in terms of values of hyperparameters have yet to be established, multiple iterations were done to find the optimal number of hidden layers, neurons, and appropriate values of other hyperparameters. In every iteration, the considerations included: improvement in terms of loss, computing duration, computing resources, and model performance.

  • To predict the output label, the hidden state at the last timestep was passed through a fully connected layer and a subsequent softmax layer. The batch size used was the number of attempts per student which was 100 and the timestep was set to 1 so that the network would consider each attempt as it back propagates when calculating gradients for weight updates.

  • The case of overfitting was monitored and varying values of dropouts were applied as needed as well as the number of epochs which was finalized at 200.

  • For the activation function, a non-linear activation function was used to ensure nonlinearity and make it easy for the model to generalize or adapt with a variety of data and to differentiate between the outputs. The optimal choice was to use ReLU (Rectified Linear Unit) Activation Function. It was found to greatly accelerate the convergence of stochastic gradient descent compared to the sigmoid/tanh functions. Unfortunately, ReLU units can be fragile during training and can “die.” For example, a large gradient flowing through a ReLU neuron could cause the weights to update in such a way that the neuron will never activate on any datapoint again. If this happens, then the gradient flowing through the unit will forever be zero from that point on. That is, the ReLU units can irreversibly die during training since they can get knocked off the data manifold. As a solution, ReLu was replaced with Leaky ReLU in the subsequent iterations. Leaky ReLUs are one attempt to fix the “dying ReLU” problem. Instead of the function being zero when x < 0, a leaky ReLU will instead have a small negative slope of 0.01, or so.

  • Since both models are multi-class classification in nature, categorical cross entropy was used as the loss function and Adam was used for the optimizer during the compilation of the model. For the output layer, the Softmax function was used as the activation function which returns the probabilities of each class.

  • Another issue that had to be addressed was class imbalance. Majority (71%) were attempts on a new problem, 20% were replays, and only 9% were revisits. This was addressed using the sklearn.utils. class_weight. compute_class_weight from the Scikit-learn library which computes for the appropriate weight based on the given training data. The computed values were stored in a dictionary which was then implemented during training.

  • A student-level cross-validation was done to ensure that each student’s data was either on the training set or the testing set.

To answer the research question, quantitative analysis was performed on the features derived from the input data in relation to the model’s prediction results.

Results and discussion

This section presents the results of the additional analysis to determine features that predict the incidence of revisit to a previously unsolved problem by understanding how the problems or levels played during the incubation period may have influenced the incidence of post-incubation.

The revisit model—What features predict post-incubation?

The model performed well in correctly predicting the occurrence of revisits (recall = 88.85%, precision = 83.19%, f1-score = 85.93%, kappa = 0.746) as shown in the confusion matrix (see Table 3).

Table 3 Revisit model confusion matrix

Given its abstract output, analyzing the input data in relation to the prediction results is an empiric method to explore what the neural net has learned.

Time of revisit

First, the time of revisit was investigated and an increase in the number of attempts was observed during the last 30 min of the session as shown in Fig. 8.

Fig. 8
figure 8

Actual number of attempts over time

When the correct predictions were plotted, the same trend was also observed as shown in Fig. 9.

Fig. 9
figure 9

Correctly predicted attempts over time

To further look at time as a factor, a chi-square test of independence was used to examine any dependency between the actual occurrence of new attempts, replays, and revisits and as well as with the predictions. The relationship is significant between the 30-min time period and the actual attempts made, X2 (6, N = 3632) = 17.92, p = 0.006. The actual number of attempts over time is shown in Table 4.

Table 4 Actual attempts over time

The same significant relationship was found between the predictions and the time period, X2 (6, N = 3237) = 14.13, p = 0.028. The number of correctly predicted attempts over time is shown in Table 5.

Table 5 Correctly predicted attempts over time

In terms of the revisits, it can be observed that the accuracy ratings for all the periods are fairly consistent which means the model was able to factor in the time of the attempt with respect to the session (see Table 6).

Table 6 Revisit predictions over time

Moreover, the increase in the number of revisits over time especially on the last 30 min of the session could be attributed to the fact that the students were aware that the activity had a time limit. Although the students were oriented that the result of the activity will not have an impact on their grades, the game feature of earning badges might have had an impact in terms of peer competition which resulted in the pressure to achieve more (Nemerow, 1996) which could have served as motivation to revisit the levels they previously failed to solved. Malhotra (2010) suggests that time pressure plays a part in the emergence of “competitive arousal” which leads to a shift in the motivation of the learners. At this point, the learners preferred to look at problems they have already encountered perhaps because they thought they are more likely to solve them rather than exploring new ones. The learner’s prior encounter with the problem during the pre-incubation phase provided an opportunity to prime solution ideas (Penney et al., 2004) which they were able to leverage in the later part of the session.

Problem type

The second feature was the type of problem. Each problem was classified based on its canonical solution, whether it is a lever, ramp, springboard, or a pendulum problem. This feature was analyzed to see if the type of problem the student was solving at time t would be a factor in revisiting a previously unsolved problem with a similar solution. It is important to note that for this analysis, having the same type of solution is not equivalent to being the same problem. For instance, if the student is playing level 2 playground 2 which is a pendulum problem at time t, and played level 4 playground 1 which is also a pendulum problem at time t+1, then it is considered “preceded by a similar problem.” Consecutive attempts on the same level and playground are not considered as such. The number of actual and predicted attempts for each class based on the similarity of solution with the preceding problem is shown in Table 7.

Table 7 Actual and predicted attempts based on preceding problem

To examine if the preceding problem type is associated with the attempts, a chi-square test of independence was used. The result showed that whether or not the preceding problem is similar has a significant relationship on the actual attempts, X2 (2, N = 3632) = 14.74, p < 0.001 and on the predicted attempts, X2 (2, N = 3237) = 8.81, p < 0.01.

In terms of the revisits, Table 8 shows that the prediction accuracy, although a little higher when preceded by a similar problem, is still fairly consistent for both cases.

Table 8 Similarity of problem type and prediction results

Aside from time as a factor in the concept of IE-related priming (Penney et al., 2004), encountering problems of similar solutions during the incubation phase helped them to develop familiarity. It can be inferred that encountering a similar problem could have reminded the students about a previously unsolved problem and triggered them to revisit it. In the context of recollection and familiarity (Migo, Mayes, & Montaldi, 2012), encountering a similar problem during the incubation phase serves as a stimulus that cues the recall of details linked to an unsolved problem. This might trigger a critical aspect of bringing to mind information on a previously unsolved problem and relate it to the current activity. This behavior in the context of an educational game and in relation to IE can be considered a manifestation of the intermittent conscious work theory (Mayer et al., 1995; Weisberg, 2006) presented in the literature review.

Conclusions, limitations, and future work

This paper consolidated prior work and presented additional analysis on the incubation effect phenomenon among students playing an educational game called Physics Playground. Prior analyses including the initial investigation, coarse-grained model, and fine-grained model have been presented in conferences and published in proceedings. The initial investigation reported that students’ IE success rates matched their non-IE success rates, implying that IEs may indeed benefit students who are stuck. The predictive features discovered at the coarse-grained level analysis using aggregated data matched with the theories from literature such as productivity or success rate of students (more badges earned), problem difficulty (at least 65% of all attempts on the problem was successful), total problems attempted to solve (at most 30), time period of revisit (within the first half of the session). The incidence of IE-False was also examined. Features found to lead to failure during revisit include a very lengthy incubation duration, lower success rate prior to revisit and attempting the problem more than twice during revisit.

In order to improve the IE model’s performance, the fine-grained level analysis was conducted and leveraged LSTM, t-SNE visualization, and X-means. Attempt-level data that included the actions the students performed to solve the problem was used. The fine-grained model performed better in terms of recall (91.62% vs 89.61%), precision (82.55% vs 50.73%), f1-score (86.84% vs 64.79%), and kappa (0.82 vs 0.22). The significant features were time of revisit (early part of a time-limited session are more likely to be beneficial), duration of incubation (incubation duration more than 40 min in a 2-h session resulted in IE-False), problem difficulty, student’s productivity at the time of revisit (IE-True is more likely to occur if productivity is more than 50% at the time of post-incubation), and similarity with the preceding problem. In terms of game actions, more frequent use of “erase” and “hover tutorial” features and lesser use of the “pause” function were discovered. For problem-specific actions, improvements in the student’s drawing of a ramp and springboard were observed.

The additional analysis focused on understanding the factors associated with the post-incubation phase or the revisit. Using LSTM and the same data in the fine-grained analysis, attempts were classified to be either a new attempt, a replay, or a revisit. The model performed well (recall = 88.85%, precision = 83.19%, f1-score = 85.93%, kappa = 0.746) and quantitative analysis was conducted on the features derived from the input data to gain insights on their relationships. Results showed that time, specifically 30-min periods in a 2-h session, and encountering a level with a similar solution were factors in revisiting previously unsolved problems. From these findings, it can be inferred that in a time-limited session with a game-based learning environment, time pressure may contribute to the emergence of competitiveness of the learners that lead to more problem revisits or a higher incidence of post-incubation in the later part of the session. Revisiting previously unsolved problems after encountering a problem of the same type could be considered an indication of the intermittent conscious work theory (Mayer et al., 1995; Weisberg, 2006) where the learners try to link the current task to the unsolved problems they have set aside. Both time and intermittent conscious work helped in the development of familiarity that the learners chose to utilize as a response to the “competitive arousal” (Malhotra, 2010) towards the end of the session.

These findings could help quantify the pedagogical practice where teachers instruct students who are stuck at a problem to skip it and go back to it at a later time. It showed that incubation can be an effective technique in solving problems where activities performed during the break are similar or related tasks and the features extracted from this study could be translated to design features that could be used in other educational games. However, this work is limited in two aspects. First, it deals with only one of the four approaches in defining incubation which is the intermittent conscious work because even though the students are taking a break from the unsolved problem, their incubation time is still spent inside the game. Other IE experiments, although not in the context of a computer-based learning environment, included giving the students an entirely different activity during incubation or asking them to take a rest. The second limitation is the session time. The three phases of IE as operationalized in this work were bound to the 2-h time limit of the activity. Variations in the incubation period that would last for hours or days would be interesting to explore.

Aside from contributing to what is known about IEs, this work consolidated the first attempt to investigate and model IE in the context of a computer-based learning environment with fine-grained interaction logs like Physics Playground. Most research in IE used standard tests to measure fluency and creativity (Baird et al., 2012; Fulgosi & Guilford, 1968; Gilhooly et al., 2013; Sio & Ormerod, 2015), mathematical adeptness (Fulgosi & Guilford, 1968; Segal, 2004; Tan, Zou, Chen, & Luo, 2015), and even memory (Ellwood et al., 2009). These earlier works manually observed, recorded, and assessed test subjects based on task performance and were scored based on the results produced in the pre- and post-incubation phases. This study, on the other hand, opens the idea of using computer-based learning environments in studying phenomenon of a similar construct with IE since interaction logs of test subjects can be recorded automatically and hence more accurately.

Based on the limitations of this study and its findings, the following are recommended:

First, the data collection method used in this study was based on the concept of stealth assessment (Shute & Ventura, 2013) where interactions are recorded in logs and are later on used to study certain behaviors or phenomena versus experimental research design where certain variables are manipulated or controlled. The students were not explicitly instructed to leave a problem in case they were stuck. They were allowed to solve the problems in any sequence they wish and they can return to any previously unsolved problem. Based on this limitation, another experiment can be recommended which would involve a control group and an experimental group where one group is instructed and allowed to incubate and the other is not to further study the benefits of incubation versus no incubation in an experimental setup.

Second, the features that predict IE extracted from the fine-grained model could be translated to rules or mechanics of an educational application and could also be used to conduct a comparative experiment. For example, after a player failed to solve a problem on the first attempt (pre-incubation), allow re-attempts but when they reached an impasse, prompt the player to leave the problem and try other problems (incubation phase). When a similar problem is encountered (a problem with a similar solution or type as a previously unsolved problem), and the player has been relatively productive during the incubation phase, prompt the player to revisit the unsolved problem (post-incubation). With the development of a game implementing these features, other related constructs described in the literature review could also be studied further.

Availability of data and materials

Data and material are not available as our consent forms did not include information regarding sharing data outside of the research study.



Incubation effect


Physics Playground


Long short–term memory


Download references


The authors would like to thank the Ateneo Laboratory for the Learning Sciences, Department of Information Systems and Computer Science, and the University Research Council of the Ateneo de Manila University. We also like to acknowledge the co-authors of our prior work on IE in PP published in international conferences: Mr. Joshua Martinez and Mr. Jun Rangie Obispo for their contribution in the initial investigation and Prof. Joseph Beck for his inputs in the early stages of modeling. We also thank Prof. Valerie Shute for her assistance on questions about the canonical solutions in PP.


Not applicable.

Author information

Authors and Affiliations



MMPTF carried out the study, drafted, and revised the manuscript. MMTR supervised, provided valuable feedback, and contributed to the review and revision of the manuscript. Both authors read and approved the manuscript.

Corresponding author

Correspondence to May Marie P. Talandron-Felipe.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Talandron-Felipe, M.M.P., Rodrigo, M.M.T. The incubation effect among students playing an educational game for physics. RPTEL 16, 19 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: