The current article describes an exploratory study focussing on joint attention behaviour (JAB; e.g. Carpenter & Liebal, 2012; Eilan, 2005; Eilan, Hoerl, McCormack, & Roessler, 2005; Mundy, 2013, 2018; Mundy & Newell, 2007; O’Madagain & Tomasello, 2019; Siposova & Carpenter, 2019; Tomasello, 1995) in dyadic interaction (i.e. interactions between two participants). By focussing on JAB, the aim is to better understand collaborative problem solving (CPS), especially its social aspects during remote CPS. Based on the socio-cognitive approach to learning, CPS is seen to lie in a two-dimensional space of social and cognitive domains that intermingle in the processes of problem solving (see Funke, Fischer, & Holt, 2018; Graesser et al., 2018; Scoular, Care, & Hesse, 2017; Zwiecki, Ruis, Farrell, & Williamson Shaffer, 2020). Thus, CPS includes the tasks (the cognitive domain) and the social infrastructure (the social domain) within which the participants create and share knowledge, monitor their progress and detect and repair the breakdowns in their communicative acts (Alterman & Harsch, 2017; Roschelle & Teasley, 1995).
Solving problems together and developing a shared understanding of a shared object or an aspect of a problem to create ‘if-then’ problem-solving rules require both collaboration and negotiations of meanings (Barron & Roschelle, 2009). As Schneider and Pea (2013; see also Schneider et al., 2018) noted, the concept of joint attention is closely associated with successful processes of joint problem solving. That is, if joint attention is not achieved, it is less likely for partners to establish common ground (e.g. Baker, 2015; Baker, Hansen, Joiner, & Traum, 1999; Clark & Brennan, 1991), take the partner’s perspective (e.g. Moll & Meltzoff, 2011, 2012) and build on ideas to solve problems together. In this regard, JAB is seen to form the cornerstone of social interaction, predicting productive collaboration (e.g. Barron, 2003; Barron & Roschelle, 2009; O’Madagain & Tomasello, 2019). Therefore, to better understand CPS and particularly its social aspects, it is necessary to focus on JAB.
Despite the growing interest in studying joint attention and its premisses to understand dyadic interaction, no unified interpretation exists for what is considered joint attention and how ‘jointness’ in joint attention is achieved (Carpenter & Liebal, 2012; Seemann, 2012b; Siposova & Carpenter, 2019). Generally, joint attention has been defined as a capacity to focus together with another on an external source or object in the environment (e.g. Eilan, 2005; Mundy, 2013, 2018; O’Madagain & Tomasello, 2019). According to Siposova and Carpenter (2019), the objects of joint attention can be diverse sensory inputs, such as visual or auditory stimuli, or they can be present, past, future or imaginary events or mental states (i.e. ideas and plans; e.g. Mundy, Sigman, Ungerer, & Sherman, 1986; O’Madagain & Tomasello, 2019). Thus, the objects of attention can be observed two levels—as external sources or events or mental, ‘internal’ contents (O’Madagain & Tomasello, 2019).
Gaze following is viewed as a promising basis of JAB (Seemann, 2012b). It is linked to meaningful collaborative interactions (e.g. Schneider et al., 2018; Schneider & Pea, 2013, 2014), especially in tasks that require partners to build a shared problem space (Roschelle & Teasley, 1995). For example, when studying visual or perceptual joint attention and visual synchronisation in dyads (see, e.g. Liu et al., 2021; Olsen, Aleven, & Rummel, 2017; Schneider & Pea, 2013, 2014; Schneider et al., 2016, 2018), in which the object of attention is regarded as external, gaze can display alignment with the partners. In their dual eye-tracking study in a remote setting, Schneider and Pea (2013, 2014) found that, when the participants could see the gaze of their partner, more visual joint attention (i.e. moments when the partners were looking at the same area of the screen during a 2-s timeline) were reached. In addition, the percentage of moments of visual joint attention correlated with a higher quality of collaboration and mediated learning. Comparable outcomes have been reached across different eye-tracking settings. For example, in a co-located eye-tracking setting, higher recurrence of joint visual attention was found to correlate with task performance and learning outcomes (Schneider et al., 2016, 2018). As put forward by Schneider and Pea (2014), a measure of visual joint attention can be an interesting proxy. That is, for example, for evaluating the quality of social interactions, as well as a basis for further analysing the data, such as by qualitative means.
In addition to gaze following, JAB includes the coordination aspect of joint attention and the sharing of attention (Carpenter & Liebal, 2012; Tomasello, 1995). In JAB’s richest definition, individuals must equally recognise that they are attending to the same thing (O’Madagain & Tomasello, 2019; Siposova & Carpenter, 2019; Tomasello, 1995). Thus, following Carpenter and Liebal (2012), it is only communication that ‘turns mutually experienced event into interaction, into something joint’ (p. 168). Appropriately, to be successful, CPS not only necessitates the lower attentional levels that can be found by analysing visual joint attention (see, e.g. Liu et al., 2021; Schneider & Pea, 2013, 2014; Schneider et al., 2016, 2018) but also requires considering joint attention to ‘internal’, mental content (O’Madagain & Tomasello, 2019). This represents ‘the ability to focus together in the conversation on the content of our mental states’ (O’Madagain & Tomasello, 2019, p. 1). By the contents of the mental states, O’Madagain and Tomasello (2019) meant, for example, the contents of any thoughts, plans, beliefs or reasons. Achieving visual joint attention to external content is considered a perceptual phenomenon. However, in joint attention to mental content, it is the linguistic exchanges that are perceptible, and when attending to those exchanges by monitoring one another’s attention and the partner’s reaction to these communicative acts they jointly attend to mental contents (O’Madagain, 2016; O’Madagain & Tomasello, 2019).
There are multiple definitions of and ways to use the term ‘joint attention’, varying from visual joint attention to joint attention to mental contents (O’Madagain & Tomasello, 2019). Siposova and Carpenter (2019) argued that joint attention should not be considered a single state or binary event (i.e. there is or is not jointness). Instead, it should be viewed as a process comprising various, hierarchically nested, and closely connected but distinct phenomena that can be discovered in the related literature, all referred to as joint attention (see Eilan et al., 2005; Mundy, 2018; Seemann, 2012a). At the surface level, definitions may sound similar, but when elaborated on in more detail, significant differences can emerge among them. Accordingly, Siposova and Carpenter (2019) have developed a spectrum of ‘jointness’, described as ‘a typology of social attention and social knowledge’ (p. 261) that aims to cover the diversity of the definitions that all include the notion of a triadic relationship between self, other and an object of attention. The typology also defines distinctive levels of knowledge related to the different levels of jointness as individual, common, mutual or shared knowledge. Moreover, according to Siposova and Carpenter (2019), these levels are distinctive in terms of the participant’s perspective (i.e. second- and third-person perspectives; see also, e.g. Moore & Barresi, 2017) and the type of knowledge related to a particular attentional level. They also differ in terms of the level of dependency between partners, as well as the level of experience (i.e. individual or jointly created). An essential precondition for each of the four levels of social attention is the individual’s ability to engage in individual attention. This refers to the situation in which an individual is attending to something in the environment with a first-person perspective. Joint attention (whether to external entities, situations or involving communicative acts) is closely connected to collaboration and reflective reasoning with others (O’Madagain & Tomasello, 2019), representing the core elements of CPS. Accordingly, this study takes the typology of jointness by Siposova and Carpenter (2019) as a promising conceptual ‘lens’ to better understand and exemplify CPS process diversity, particularly regarding social aspects of remote CPS.
When focussing on CPS processes, the study takes the unique properties of the remote, game-like CPS assessment environment (Assessment and Teaching of 21st Century Skills [ATC21SFootnote 1]; e.g. Care, Griffin, & Wilson, 2018; Care, Scoular, & Griffin, 2016; Scoular et al., 2017) as its point of departure. ATC21S was one of the pioneering international projects in exploring CPS competency for assessment and teaching purposes (e.g. Care et al., 2018; Care et al., 2016; Griffin & Care, 2015; Griffin, McGaw, & Care, 2012; Scoular et al., 2017). The CPS tasks of the ATC21S environment have been designed for dyads following a comprehensive CPS framework by Hesse, Care, Buder, Sassenberg, and Griffin (2015; see also Care et al., 2016; Scoular & Care, 2020; Scoular et al., 2017). The framework of CPS covers both social and cognitive elements of the CPS construct (cognitive, social and regulatory aspects), and it amalgamates theoretical knowledge from social psychology and problem solving. In brief, the framework involves three main strands of social elements (i.e. participation, perspective taking, social regulation) and two main strands of cognitive elements (i.e. task regulation, knowledge building), which are all further divided into sub-elements (19 elements in total; Hesse et al., 2015). In CPS, the social elements are related to how participants coordinate and communicate with one another (e.g. Clark & Brennan, 1991; Richardson, Dale, & Kirkham, 2007), which is considered particularly important in synchronous collaboration (Baker, 2015), the context of this study. In addition, coordination is fundamental in establishing mutual knowledge or common ground (e.g. Clark & Brennan, 1991). Yet, according to Barron (2000), this can be challenging for the partners in problem-solving discussions because of the often new and indefinite goals, different ideas and terms, as well as their relations.
The social aspects are also related to how the partners regulate and resolve differences among the collaborating participants (e.g. Hadwin, Järvelä, & Miller, 2018). The cognitive elements, in turn, are related to how effectively and efficiently participants solve the problem (e.g. Mayer, 1992, 1998). The designed ATC21S tasks, based on the framework, both enhance and require CPS elements to occur (e.g. Care et al., 2016; Hesse et al., 2015; Scoular et al., 2017. Thus, the tasks aim to encourage the student to collaborate with another student, and the collaborative tasks are designed to stimulate and elicit the social and cognitive elements of the framework. To succeed in CPS task completion, the tasks require varied knowledge, expertise and skills, both in terms of social and cognitive processesFootnote 2. Taken together, the underlying objective of CPS and the task designs are related to bringing about the continued attempts of participants to acquire a shared understanding of a problem or challenge (Roschelle & Teasley, 1995). This can, via engaging in the peer- or group-level process (e.g. Sinha, Kempler Rogat, Adams-Wiggins, & Hmelo-Silver, 2015), produce learning. According to Dillenbourg, Lemaignan, Sangin, Nova, and Molinari (2016), this can be referred to as ‘the upper class of collaborative learning’ (p. 228), which requires a high level of joint attention, for example, to a task-related object or an aspect of the problem (Baker, 2015).
To better understand JAB in dyadic interactions during CPS, the gaze behaviour of the partners is a significant resource. To examine gaze patterns, in eye-tracking studies, the predominant focus has been on the overall looking times at predefined areas of interest (AOIs; spatial information as ‘where’ questions; see, e.g. de Leeuw, Segers, & Verhoeven, 2016; Hautala et al., 2019; Liu et al., 2021). Moreover, to study the ‘when’ question of eye gazing, cross-recurrence plots (e.g. Marwan & Kurths, 2002; Richardson & Dale, 2005) have been commonly used to study joint attention to external contents, such as visual joint attention or gaze alignment. Cross-recurrence is a general measure that quantifies the similarity or the coupling between two dynamical systems (Nüssli, Jermann, Sangin, & Dillenbourg, 2013; Richardson & Dale, 2005). When studying collaborative learning in remote and co-located dual eye-tracking situations, the cross-recurrence plots (Jermann, Mullins, Nüssli, & Dillenbourg, 2011) and augmented cross-recurrence plots (Schneider et al., 2016, 2018), for example, have been particularly capable of visualising the temporal evolution of gaze behaviour in achieving visual joint attention. Yet, when analysing joint attention to mental contents, often related to the higher attentional levels of JAB, including the contents of social interaction as the primary source (e.g., Falck-Ytter, Bölte, & Gredebäck, 2013; Holler & Kendrick, 2015), this analytic approach may not be sufficient. Therefore, to explore JAB as a socio-linguistic phenomenon and combine ‘where’ participants look with ‘when’ they look at the AOIs (i.e. the timing of gazing) in the interactional sequences (see Korkiakangas, 2018) is more suitable here. Thus, event-related measures focussing on the interactional organisation of gaze are more informative about what makes some instances of gazing ‘social’ (e.g. Dindar, Korkiakangas, Laitila, & Kärnä, 2017; Korkiakangas, 2018; Tuononen, Korkiakangas, Laitila, & Kärnä, 2016).
In the current study, with the challenging dynamic scene of the remote environment, there are multiple eye-gaze behaviours linked to JAB, such as gazing at the chat window, the actionable artefacts and the instructions. Although gaze is not similarly organised into sequences as verbal interaction is, it is organised according to the actions it performs (e.g. Chepinchikj, 2020). Therefore, in the current study, it is expected that focussing on the gaze patterns in parallel with the interactional sequences of the communicating partners identified from the log data will help us go beyond these sequences and better identify behaviours related to JAB here.
To explore and identify behaviours related to JAB in remote CPS and related meaningful events, qualitative interaction analysis (e.g. Valde, 2017) is applied based on multiple observational data (log files, eye-tracking data). The remote ATC21S environment utilised here includes dynamic stimuli (i.e. actionable artefacts) and a chat property designed for free-flowing written interaction in dyads as the communication affordance. Whilst the automatically generated log files as chat and actions of interacting dyads incorporate multiple pieces of information from joint processes (Graesser et al., 2018), to make visible the typology of jointness as defined by Siposova and Carpenter (2019), the eye-gaze patterns of the individual partners are also identified as significant.