Learning culturally situated dialogue strategies to support language learners

Successful language learning requires an understanding of the target culture in order to make valuable usage of the learned language. To understand a foreign culture, language students need the knowledge of its related products, as well as the skill of comparing them to those of their own culture. One way for students to understand foreign products is by making Culturally Situated Associations (CSA), i.e., relating the products they encountered to products from their own culture. In order to provide students with CSA that they can understand, we must gather information about their culture, provide them with the CSA, and make sure they understand it. In this case, a Culturally Situated Dialogue (CSD) must take place. To carry the dialogue, dialogue systems must follow a dialogue strategy. However, previous work showed that handcrafted dialogue strategies were shown to be ineffective in comparison with machine-learned dialogue strategies. In this research, we proposed a method to learn CSD strategies to support foreign students, using a reinforcement learning algorithm. Since no previous system providing CSA was implemented, the method allowed the creation of CSD strategies when no initial data or prototype exists. The method was applied to generate three different agents: the novice agent was based on an eight states feature-space, the intermediate agent was based on a 144 states feature-space, and the advanced agent was based on a 288 states feature-space. Each of these agents learned a different dialogue strategy. We conducted a Wizard of Oz experiment during which, the agents’ role was to support the wizard in their dialogue with students by providing them with the appropriate action to take at each step. The resulting dialogue strategies were evaluated based on the quality of the strategy. The results suggest the use of the novice agent at the first stages of prototyping the dialogue system. The intermediate agent and the advanced agent could be used at later stages of the system’s implementation.


Introduction
Successful language learning requires more than the knowledge of vocabulary and grammar skills. It is as least as important to develop an understanding of the different culture in order to make a valuable usage of the learned language (Wagner et al. 2017). Intercultural competence is viewed as being as important as communication and should be an integral part of the language curriculum (Stewart 2007). However, the inclusion of the cultural aspect in language teaching has been challenging for teachers (Kissau et al. 2012). An important aspect of understanding another culture is the knowledge of its related products, as well as the skill of comparing them to those of their own culture. By putting ideas or products of two culture side by side, the student can see how each would look from the other perspective and avoid misunderstandings (Byram et al. 2002).
As a part of the language learning process, it is recommended that students visit a foreign country in order to practice the language and explore the culture (Byram et al. 2002). During these visits, food products are usually instantly encountered and are among the most obvious products that highlight cultural differences. Food, eating, food behaviors, and food social norms are intimately connected to cultural identity and deeper cultural concepts (Kanafani-Zahar 1997;Scholliers 2001). One method to explain a particular food product to a student would be to display a complete listing of the ingredients as well as a description of the food product. However, this kind of information might leave them with questions like how does it taste like? what is the texture? and when do we use it?
In a situation where providing a simple description of a product fails to deliver a complete understanding of the meaning of the product, an efficient alternative would be to relate the product to a similar product in the student's culture. This would mean offering Culturally Situated Associations (CSA) that allow learners to understand the meaning, usage, and taste of the food product they are inquiring about. A system that supports students with CSA must deliver the associations and make sure that those associations were understood. The previous requirements can be fulfilled by learning Culturally Situated Dialogue (CSD) strategies that would support the realization of those objectives. However, when no initial observations or system exists, learning dialogue strategies is a challenging task. In fact, developers or designers of a CSD system may not be able to predict the most appropriate action to be taken by the system at each moment and would have to invest in a time consuming task to predict what would be the most appropriate action in each situation. Moreover, the number of different utterances that could occur in a dialogue system is numerous and previous work showed that automatic dialogue strategies outperformed handcrafted ones (Scheffler and Young 2002). Ishida (2016) highlighted the need "to model agents that can not only support a specific culture, but also recognize the differences among cultures, and differences among the understanding of cultural differences. " Aligning with the need for an agent that can provide CSA and addressing the challenge of automating CSD strategies in the particular situation where no initial system exists, this research proposes a method to learn CSD strategies to support students when no data or working prototype exists.

Culturally situated associations
A variety of intercultural communication models have been proposed by researchers. However, the most influential model is attributed to Byram because his approach provides a holistic intercultural competence and has defined objectives and practical derivations (Chen and Yang 2014). Byram's model defines the five following skills needed in order to accomplish a successful intercultural communication: intercultural attitudes, knowledge, interpreting and relating, discovery and interaction, and critical cultural awareness (Byram 1997). Two of those skills are necessary in the initial stages of familiarizing with a new culture and are essential to understand foreign concepts or products (Byram 1997): • Discovery or knowledge: knowledge about a social group and their products and practices in the foreign visitor's own country; and • Interpreting and relating: foreign visitors relate the information they get to information from their own culture.
Over the years, efforts have been made to use computer technologies to support the teaching of culture. To understand the other culture, different approaches were implemented: showing to students juxtaposed texts from different cultures (Liaw 2006), concordances of two corpora to investigate different usages of a word in different cultures (Leech and Fallon 1992), use of web-based tools (online forums, weblogs, Skype, and email) (Chen and Yang 2014). However, most of previous studies focused on fostering Byram's skills of intercultural attitudes, knowledge, discovery, and interaction.
The skills of interpreting and relating are not tackled in computer-assisted education and consists of putting concepts or products from two or more cultures side by side and seeing how each might look from the other perspective (Chen and Yang 2014). Providing students with CSA means putting the concepts or products from the student's culture and from the target culture side by side and helping the student interpret and relate the concepts that they encounter.

Dialogue strategies
In real life situations, interpreting and relating cannot be achieved in real time as CSA requires a deep knowledge about the foreign culture and information about the student's culture. In order to provide students with CSA that they can understand, we must gather information about their culture, provide them with the CSA, and make sure they understand it. In this case, a Culturally Situated Dialogue (CSD) must take place. To carry the dialogue, dialogue systems must follow a dialogue strategy.
The recent literature shows a growing interest in the implementation and use of automatic dialogue systems. The development of such dialogue systems, and more particularly, the development of dialogue strategies is challenging (Eckert et al. 1997). In order to achieve a dialogue in an efficient way through a series of interaction with the user, dialogue strategies are needed. By quantifying the achievement of the dialogue goal as well as the efficiency of the strategy, is it possible to describe the system as a stochastic model that can be used for learning those dialogue strategies (Levin et al. 1998). This method has many advantages including a possibility of an automation of the evaluation of the dialogue strategies as well as an automatic design and adaptation. In previous works on dialogue systems, reinforcement learning was used in order to learn Wizard of Oz' dialogue strategies of presentation of information and replicate them. Wizard of Oz allows the learning of dialogue strategies when no initial system exists. The results showed that reinforcement learning combined with Wizard of Oz experiment allows the development of optimal strategies when no working prototype is available (Rieser and Lemon 2008). In fact, reinforcement learning significantly outperformed supervised learning when interacting in simulation as well with as with real users (Rieser and Lemon 2008). However, unlike standard dialogue systems that take into account user-related properties, the challenge in learning optimal CSD strategy consist of learning which information about the student's culture, if any, should be inquired and in which order.

Wizard of Oz
The Wizard of Oz (WoZ) is a research experiment in which the users interact with a computer system that they believe to be autonomous. The computer system is actually operated either partially or completely by a human being (the wizard) (Kelley 1984). The WoZ experiment is useful in different cases. It allows the gathering of information in the case of lack of basic knowledge about the user performance during a computer-based interaction. Moreover, the usage of WoZ will allow many speech designers to participate in building the knowledge about the user performance during a computer-based interaction. Finally, the WoZ allows an iterative design approach for building user interfaces as it is easy to use, requires little programming and supports rapid testing and interfaces modifications (Klemmer et al. 2000). The design of a WoZ experiment may contain different amounts of control ranging from a complete automation of the interaction to an interaction solely dependent on the wizard, as well as mixed-initiative interactions (Riek 2012). Green, Huttenrauch, and Eklundh (2004) set one of the most recognized conditions for conducting a WoZ experiment: the user should have access to specific instructions, the designers should have a behavior hypothesis as well as a specified robot behavior. The architecture's requirements of a WoZ experiment were set by Fraser and Gilbert (1991, p.81-99) that state that (1) "It must be possible to simulate the future system, given human limitations" ; (2) "It must be possible to specify the future system's behavior" ; (3) "It must be possible to make the simulation convincing. " The implementation of WoZ experiments should use scenarios to place additional constraints on the study. Previous guidelines highlight the importance of scenario constraints for WoZ experiments. (Dahlbäck et al. 1993;Fraser and Gilbert 1991;Green et al. 2004;Riek 2012). The scenario constraints allow participants to have a task to solve that requires the use of the system, and where there is not a single way to solve the problem (Riek 2012;Dahlbäck et al. 1993). Finally, in a review paper, Riek (2012) went through 54 papers and categorized the papers by types of wizard control used. 72.2% of the papers reported using the WoZ experiment to control a natural language processing component such as having the robot engage in a dialogue and appropriately make utterances. The use of WoZ has shown to be beneficial to design and test dialogue strategies when no initial data or working prototype exists (Rieser and Lemon 2008). Using WoZ is a way of collecting data, before actually building a system that might need this data to be built. Moreover, it allows the testing of parts of the system without having to program and design the whole system in order to do it. Figure 1 shows the system architecture. The WoZ experiment is used because no working prototype or initial CSD system is available. The student and the wizard communicate through Skype to allow the wizard to see the product the student is asking about. In order to provide the wizard with the optimal dialogue strategy, an agent is trained based on a reinforcement learning algorithm, and passes to the wizard the optimal strategy to take at each step. The wizard reports first their state of knowledge to the agent through a web interface (e.g., I do not have any information about the student's country yet). Once the agent receives the current state of knowledge of the system, it provides the wizard with the appropriate action to take (e.g., ask for the student's country). If the agent suggests the querying of the associated concept, the wizard retrieves the CSA from a provided  Figure 1 shows the system architecture. The student and the wizard communicate through Skype to allow the wizard to see the product the student is asking about. In order to provide the wizard with the optimal dialogue strategy, an agent is trained based on a reinforcement learning algorithm, and passes to the wizard the optimal strategy to take at each step. The wizard reports first their state of knowledge to the agent through a web interface (e.g., I do not have any information about the student's country yet). Once the agent receives the current state of knowledge of the system, it provides the wizard with the appropriate action to take (e.g., ask for the student's country). If the agent suggests the querying of the associated concept, the wizard retrieves the CSA from a provided database. The database contains food items as well as their related country of origin, the region of origin, the related ingredients, and their usage. The dialogue, directed by the agent and executed by the wizard, is carried out until the CSA is provided to the student and understood by them database. The database contains food items as well as their related country of origin, the region of origin, the related ingredients, and their usage. The dialogue, directed by the agent and executed by the wizard, is carried out until the CSA is provided to the student and understood by them.

Identification of dialogue patterns
In order to extract the necessary components needed to build the feature space of the reinforcement learning algorithm and create the automatic dialogue strategies, we first identify common natural dialogue patterns to provide CSA to students.
To identify the possible dialogue patterns, we first conducted interviews with tourists in Nishiki Market, a traditional food market in Kyoto. We interviewed 15 tourists coming from western countries, chosen randomly during their visit to the market. We decided to interview tourists instead of students due to the fact that Japanese language students might have different levels of familiarity with Japanese products depending on their language level. This difference might lead to dialogue patters that are not representative of the ones a beginner language student would have. The breakdown of gender was balanced and the participants were from Europe, New Zealand, and USA. The tourists were asked to list the questions that they would have wanted to ask if it was possible to communicate with the shop clerks and get an answer. We received 34 questions from the participants. Table 1 shows the different questions asked by participants from different countries.
Similar questions were put together, and the tourists' questions were categorized by question topic. The questions of the tourists were classified into three categories shown  Table 1 shows the result of the interviews conducted in Nishiki market, Kyoto. The tourists were asked to list the questions that they would have wanted to ask about the food products if it was possible to communicate with the shop clerks and get get an answer. We received 34 questions from the participants. The table shows the different questions asked by participants as well as their country of origin in Table 2. The first category contains the questions about the ingredients of a particular food. Questions about the taste were classified under the ingredients category as we considered that the ingredients of the food can give an idea about the taste (salty, sweet, sour, etc.). The second category includes the questions about the usage. The last category includes general questions about the composition and the usage of the food. Based on the previous questions provided by the tourists, we create typical dialogues that could happen between the shop owners and the students during their travels. During those conversations, shop owners naturally follow a CSD strategy to answer the questions of the students with CSA. We match each of the previous examples to a pattern of CSD. To further understand the CSD, we define several terms as follows: • Target concept is the concept that needs to be explained.
• Associated concept is used to explain a target concept. It is a concept that belongs to a different culture than the target concept.
• Common attribute is an attribute or a property that belongs to both the target and the associated concepts.
• Cultural attribute, such as a location, and language, is a common attribute which contributes to determine a culture. General questions What is this? What is the difference between X and Y? Table 2 shows the classification of the questions of the tourists as well as their categories. The first category contains the questions about the ingredients of a particular food. Questions about the taste were classified under the ingredients category as we considered that the ingredients of the food can give an idea about the taste (salty, sweet, sour, etc.). The second category includes the questions about the usage. The last category includes general questions about the composition and the usage of the food Using the previous terms, we classify culturally situated conversations into several culturally situated dialogue patterns:

Example conversation 1
Student: What is this and how does it taste? Shop owner: It is Neri Goma. It is a paste made out of roasted sesame seeds. Where are you from? Student: Iraq. Shop owner: It is like Tahine.

Dialogue pattern 1: using cultural attribute as a pivot
Student: Question about the taste of the target concept. Shop owner: Question to identify the cultural attributes of the student. Student: Student provides the cultural attributes. Shop owner: Finds the associated concept that possesses cultural attributes that matches the student cultural attributes and common attributes related to the taste that are identical to the common attributes of the target concept.

Example conversation 2
Student: What is this? How do we use it? Shop owner: It is Neri Goma. It is a paste made out of roasted sesame seeds. Where are you from? Student: Iraq. Shop owner: It is like Tahine, but in Japan it is mainly used in sweets.

Dialogue pattern 2: comparative association
Student: Question about a target concept. Shop owner: Question to identify the cultural attributes of the student. Student: Student provides the cultural attributes. Shop owner: Finds the associated concept that possesses cultural attributes that matches the student cultural attributes and common attributes related to the taste that are identical to the common attributes of the target concept. In case other common attributes differ from the target concept's common attributes, the differences are presented to the student.

Example conversation 3
Student: What is this? Shop owner: It is Udon, noodles made out of wheat and flour. They are usually eaten in broth. Student: What is the difference with Soba? Shop owner: Udon is made out of wheat and Soba out of buckwheat. Where are you from? Student: Italy Shop owner: Udon is more like Spaghetti and Soba like Pizzoccheri

Dialogue pattern 3: intra-cultural comparison
Student: Question about the difference between two target concepts.
Shop owner: Question to identify the cultural attributes of the student Student: Student provides the cultural attributes. Shop owner: The difference between the two target concepts is identified by comparing all their common attributes. Based on the cultural attributes of the student, two associated concepts with the same difference in the common attributes are found.
Based on the previous dialogue patterns, we extract the components essential to conduct CSD strategies: -Target concept -Associated concept -Cultural attributes -Common attributes

The reinforcement learning algorithm
The Markov decision process is a mathematical formalism that is used to implement the reinforcement learning algorithm. Our algorithm was based on Ng's (2000) work. The main components of this formalism and their implementation are State and action space : the states of the reinforcement learning algorithm amounts to all the states that the system (the wizard in our current system) possesses about internal and external resources that it is interacting with (e.g., country of the student, associated concepts). The action set of the dialogue system includes all possible actions it can accomplish. It includes the interactions with the user (e.g., asking the student for their region, providing the student with an associated concept) as well as the interactions with other resources (e.g., searching for the associated concepts).
When the system's current state is s and an action a is taken, the state changes to s'. For example, when the system is in an initial state and the wizard does not have any information, the agent will ask the wizard to interact with the student and obtain a specific information. The next state, s', will depend on whether the wizard obtained the information or not. We identified the possible state spaces based on the components extracted from the dialogue patterns. The target concept is assumed to be known as the wizard would be interacting with the student and would be able to identify it. The cultural attributes are necessary in order to determine the culture of the student, and thus, in which culture the associated concepts should be found. Students usually have a question that is related to a particular common attribute (e.g., usage, ingredients). The common attributes are necessary as they will be the basis of the comparison between the target concept and the associated concept. The action space is directly extracted from the state space. Based on the previously defined components, we created three levels of state spaces with different granularity in terms of states spaces. The three different agents were named, respectively, novice agent, Intermediate agent, and advanced agent. Transition probabilities : the transition probabilities of transitioning between a state s to a state s' given an action a are estimated using observed data. The estimated transition probability, P s,a,s , is computed as follows: P s,a,s = number of times we took action a in state s and got to s number of times we took action a in state s In the case where an action a is never taken from a state s, we consider P s,a,s to be equal to 1 number of states , assuming that the probability is equally distributed over all states. Reward : We suppose that the reward is unknown. We can also compute the expected immediate reward in a specific state as the average reward observed in state s. Value iteration and policy : a policy is any function π that maps the states to the actions.
Some policy π is executed if, whenever we are in state s, we take the action a = π(s).
The value function for a policy π is the expected sum of discounted rewards when we start in state s and take actions according to π. The value function of a policy π is given by the Bellman equation (Bellman 2013).
The Bellman equation states that the expected sum of discounted rewards V π (s) is given by the sum of the immediate reward and the expected sum of future rewards. We define as well the optimal value function given by V * (s) is the best expected sum of discounted rewards that can be reached using any policy. Based on the previous equations, we will describe the algorithm that we used to calculate the value function and to get the best policy: -For each state s, initialize V π (s) = 0 -Repeat until convergence: for each state, update: -Policy in state s is the a A which maximizes V(s) .
In this algorithm, we are updating the estimated value function based on the Bellman equation. For every state s, we compute the new value of V(s). After a certain number of iterations, the value is supposed to converge towards V*(s).

The novice agent
The first level feature space produces the novice agent. The state space includes only three entries that represent the mental state of the system; in other terms, the current state of the wizard.
-Does not know the user's culture/knows the user's culture.
-Does not know the associated concept/knows the associated concept.
-Knows that the user does not understand the concept/knows that the user understands the concept.
Every entry can take either of its values, giving us a total number of eight states, including two final states. The final states are the states we want the agent to reach at the end of the dialogue. An episode of the reinforcement learning algorithm ends when the final states are reached. At the end of the dialogue, the student should get an associated concept that answers their question and they should be able to understand the associated concept provided to them. The final states are all the combination of states that include the two following entries: knows the associated concept and knows that the user understands the concept: -Knows the user culture/knows the associated concept/knows that the user understands the concept. and -Knows the associated concept/knows that the user understands the concept.
For the first level feature space, the action space includes only three actions: -Identify the user's culture.
-Identify the associated concept.
-Ask if the user understood the concept.

The intermediate agent
The second level state action space produces the Intermediate agent. The second level state space is the result of breaking down the first level state space into more precise states of knowledge. It includes six entries that represent the mental state of the system.
-Does not know the user's country/knows the user's country.
-Does not know the user's region/knows the user's region.
-Does not know the common attributes/knows the common attributes.
-Does not know if there is an associated international concept/knows that there is an associated international concept/knows that there is not an associated international concept. -Does not know the cultural associated concept/knows the cultural associated concept. -Does not know if the student understood the associated concept/knows that the student understood the associated concept/knows that the student did not understand the associated concept.
Every entry can take either of its values, with all permutations giving us a total number of 144 states, including 15 final states. To be in a final state, the agent should know the associated concept and should know that the user understood the associated concept. Moreover, the knowledge of the system should be consistent (e.g., the system knows the cultural associated concept but does not know either of the cultural attributes, is not a final state).
For the second level state space, the action set includes six actions: -Identify the user's country.
-Identify the user's region.
-Identify the common attributes.
-Identify if there is an associated international concept.
-Identify if there is a cultural associated concept.
-Ask if the user understood the concept.

The advanced agent
The third level state action space produces the advanced agent. The third level state space is the result of breaking down the second level state space in more precise states of knowledge. It includes seven entries that represent the mental state of the system: -Does not know the user's country/knows the user's country.
-Does not know the user's region/knows the user's region.
-Does not know the common attributes/knows the common attributes.
-Does not know if there is an associated international concept/knows that there is an associated international concept/knows that there is not an associated international concept. -Does not know the country associated concept/knows the country associated concept.
-Does not know the region associated concept/knows the region associated concept.
-Does not know if the student understood the associated concept/knows that the student understood the associated concept/knows that the student did not understand the associated concept.
Every entry can take either of its values, with all permutations giving us a total number of 288 states, including 17 final states. To be in a final state, the agent should know the associated concept and should know that the user understood the associated concept. Moreover, the knowledge of the system should be consistent (e.g., the system knows the cultural associated concept but does not know either of the cultural attributes is not a final state). For the third level state space, the action set include the following seven actions: -Identify the user's country.

Observations
In order to obtain a policy, we create three sets of observations. Three set of observations are created for the three agents: novice, intermediate, and advanced. The three created sets contain 1000 observations each. The observations are designed to simulate the ones that would be noted by a wizard. During this work, the minimal number of observations was calculated taking into consideration the case where every state is visited by every action. In order for the simulation to be representative, the observations conform to the following assumptions: -The wizard cannot find the country's associated concept or region's associated concept if the user's country or region is not identified. As the associated concept is queried based on the student's cultural attributes, it will be impossible to find it in the case where this information is not provided. -The wizard cannot find the associated concept if the common attributes that the student is asking about are not identified. In fact, the comparison between a target concept and an associated concept is queried based on the property the student is asking about. If the student is asking about the usage, the associated concept will be a concept in the student's culture that has the same usage. -If wizards are searching for international associated concept, they will find it around 20% of the time. We consider that it is infrequent for a concept to have an equivalent concept known internationally.
-If wizards present an associated international concept to the student, the student will understand it around 80% of the time. We consider that if a concept has an equivalent concept known internationally, the student will probably know it. This assumption was made based on the Pareto principle (Newman 2005).
The novice agent needs a little amount of observations to cover all the actions that could be taken from every state (24 observations). The intermediate agent needs a more observations than the novice agent to cover all the actions that could be taken from every state (864 observations). The advanced agent needs the biggest number of observations to cover all the actions that could be taken from every state (2016 observations). Figure 2 plots the minimum number of observations versus the number of states.

Strategy evaluation
In order to evaluate the quality of the conversation strategy we assume that the wizard follows the recommendations of the agent except when: -The agent is asking the wizard to take the same action twice or more, and the wizard knows that the action will not change the current state of knowledge and will keep the dialogue in the same state. -The agent is asking the wizard to present an information to the student while the information is unavailable.
We define a score representing the quality of the policy by With n the average number of times the wizard followed the agent's recommendations per dialogue and N the average of recommendations received per dialogue. The score of the quality of the policy varies between 0 and 1.

The experiment
In order to evaluate the different policies, we prototyped a Wizard of Oz experiment set as follows: The participants The experiment involved two participants: -1 wizard: PhD student in informatics, in Japan (27 years old). -1 student: female Italian language student that arrived to Japan 2 weeks before the experiment to learn Japanese (26 years old).
The student's role was to ask about a concept she did not understand. The wizard's role was to provide CSA to the student. The two participants did not know each other previously. We will call the first participant wizard and the second participant student. The only prerequisite to participate in the experiment targeted the user of the system that had to be a Japanese language student that moved recently to Japan. We met both participants separately and provided them with the objectives and the rules of the experiment. We met with the student before the experiment and gave her a list of food products. We explained that she had to choose an item she did not know, then ask for explanations about it through the system. We showed the student the system and explained how the interaction with the system will take place. We also explained to the student that the system will help her understand the target concept and that she was interacting with a human being through the system. We also met with the wizard before the experiment and specified the behavior to be adopted during the experiment. The wizard received a training to become familiar with the objective of the dialogue, the actions that can be taken, and the database. We explained to the wizard that the dialogue strategy recommended by the system should be followed. We also provided the wizard with the two situations in which the systems' recommendation can be ignored: (1) the system is asking the wizard to take the same action twice or more and the wizard knows that the action will not change the current state of knowledge and will keep the dialogue in the same state; (2) the agent is asking the wizard to present an information to the student while the information is unavailable.

The setting of the experiment
• The wizard and the student were interacting via two computers using Skype. The wizard was typing, and the student was hearing the answer through clownfish plugin that converts text to speech.
• The wizard had access to a simple database representation from which the CSA could be extracted based on the cultural attributes and common attributes.
• The wizard and the student were asked to perform the dialogue three times. The first time, the novice agent's strategy was suggested to the wizard. The second time the intermediate agent's strategy suggested was communicated to the wizard. The third time the wizard was provided with the advanced agent's strategy.
Results of the experiment While receiving the novice agent's strategy, the wizard followed the recommendation of the agent four times over six times as shown in Table 3. The wizard reported that the recommendations of the agent were too abstract. They also reported that when the action suggested was to find the associated concept, the wizard Table 3 Dialogue between the wizard and the novice agent and wizard's compliance to recommendations * Table 3 shows the dialogue between the wizard and the novice agent as well as the wizard's compliance to recommendations (green, compliance; red, non-compliance). While receiving the novice agent's strategy, the wizard followed the recommendation of the agent four times over six times * Green, compliance; red, non-compliance found two associated concepts belonging to the same country. It was hard for them to present one to the student as there was no appropriate guidance for this situation. While receiving the intermediate agent's strategy, the wizard followed the recommendation of the agent six times over seven times as shown in Table 4. The wizard reported that the recommendations of the agent were helpful to guide them through the process. They reported confusion when the student did not understand the first associated concept, and the recommended action did not change. The wizard had to take actions that are different from the agent's recommendations.
While receiving the advanced agent's strategy, the wizard followed the recommendation of the agent eight times over eight times as shown in Table 5. The wizard reported that the recommendations of the agent were helpful and precise enough to guide them through the process. They reported that the process was conducted without any confusion.  Table 4 shows the dialogue between the wizard and the Intermediate agent as well as the wizard's compliance to recommendations (green, compliance; red, non-compliance). While receiving the intermediate agent's strategy, the wizard followed the recommendation of the agent six times over seven times * Green, compliance; red, non-compliance Figure 3 shows the score of the quality of the policy by the number of states. For the novice agent, the quality of the policy is equivalent to 0.66 and is poor compared to the intermediate agent (0.875) and the advanced agent (1). Table 6 shows the summary of the evaluation as well as the recommendation as of the usage of each agent.

Discussion
There is a need for developing and designing language tools that support a more complex view of language that the traditional formal approach and allow the users to explore  Table 5 shows the dialogue between the wizard and the advanced agent as well as the wizard's compliance to recommendations (green, compliance; red, non-compliance). While receiving the advanced agent's strategy, the wizard followed the recommendation of the agent eight times over eight times * Green, compliance; red, non-compliance meaning related aspects of the target language. It is suggested that a user-centered and iterative design process would be a good starting point to design the language tools (Knutsson et al. 2008). We proposed a tool that aims to support student understanding foreign concepts and building intercultural competence. The interviews conducted led to the creation of a user-centered system. We propose to use CSA to understand foreign concepts. CSA are based on Byram's model that is a widely accepted model that defines intercultural competence, and more  Figure 3 shows the score of the quality of the policy by the number of states. The quality of the policy is defined by Score = n N . With n as the average number of times the wizard followed the agent's recommendations per dialogue and N the as average of recommendations received per dialogue. The score of the quality of the policy varies between 0 and 1. For the novice agent, the quality of the policy is equivalent to 0.66 and is poor compared to the intermediate agent (0.875) and the advanced agent (1) particularly on the skill of interpreting and relating. Byram's model proposes five skills needed to accomplish intercultural communication: (1) intercultural attitudes, (2) knowledge, (3) interpreting and relating, (4) discovery and interaction as well as (5) critical cultural awareness (Byram 1997). Many studies used computer-mediated communication to develop second language learners' intercultural competence based on the Byram's model. However, most of past research focused on systems that help with developing the skills of intercultural attitudes, knowledge, and discovery and interaction (Liaw 2006). This study, unlike previous studies, aimed at using computer-supported communication to develop the skill of interpreting a concept from another culture and relate it to concepts from one's own.
The method proposed in this research allows the creation of automatic CSD strategies to support foreign students during their food shopping in Japan. The method could be potentially generalized to learn automatic dialogue strategies in any situation where CSA may be needed and when when little initial data or system exists. Technical and nontechnical limitations of the system are highlighted and discussed below: Use this policy for number of observations bigger than 2016 Table 6 shows a summary: the score of the quality of the policy, the minimum number of recommendations required as well as the recommendation as of the usage of each agent

Range of application
The proposed system supports students with the understanding of foreign concepts. This system focuses on culturally specific concepts expressed through words (e.g., udon, tahine, Kimono, and paella). However, this system does not support students with the understanding of culturally specific sentences such as idioms or proverbs. Future application of the proposed method to support the understanding of culturally specific sentences might lead to interesting results.

Dialogue patterns
The number of collected dialogue patterns was based on interviews conducted in Nishiki Market. Our state spaces were derived from the dialogue patterns. However, the number of dialogue patterns may not cover extensively all the culturally situated scenarios that could happen. A more extensive survey should be conducted to cover a vast majority of the questions that might be asked and thus allow the potential identification of more dialogue patterns.

State spaces
In this work, we chose three state spaces as a base for our learning algorithm. As a result, three agents could provide the wizard with dialogue strategies varying from an abstract strategy to a precise one. The extraction of state spaces was a result of the breaking down of the attributes derived from the dialogue patterns. However, the attributes could be broken down to more elaborate strategies. This work explores only three state spaces and their resulting strategies. However, by breaking down more the attributes, we would be able to study more developed agents.

Minimal of observations needed
The observations fed to the learning algorithm are one of the main components defining the resulting strategy. During this work, the number of minimal number of observations was calculated taking into consideration the case where every state is visited by every action. However, the minimal number of observations needed in order to produce an effective strategy depends on the actions and the states visited.

The experiment
The experiment objective was to compare the three different agents' performances for the same request done by the student. As the objective is the comparison of the dialogues, one pair of student-wizard might be adequate. In the experiment, the dialogue initiated by the student was chosen based on the most asked question by the tourists in the interviews (what is this?). However, it would be beneficial to broaden the range of dialogues in order to compare the different agents in different real-life situations. This could be explored later on in a study about the system itself.

Conclusion
In this research, we propose a method to learn culturally situated dialogue strategies to support foreign students using a reinforcement learning algorithm. Since no previous system was implemented, the method allows the creation of dialogue strategies when no initial data or prototype exists. To model the possible state spaces of the reinforcement learning algorithm, we first identified common dialogue patterns that take place between students and shop owners in Nishiki Market and extracted the attributes needed to conduct Culturally Situated Dialogues. By breaking down the extracted attributes into more fine-grained attributes, we created three attribute sets with different levels of granularity. Each of these three attribute sets was mapped into a different state space, resulting in the creation of three different agents: the novice agent, the intermediate agent, and the advanced agent. Each of these agents learns a different dialogue strategy. We conducted a Wizard of Oz experiment during which, the Agent's role was to support the wizard in their dialogue with students by providing them with the appropriate action to take at each step. The resulting dialogue strategies were evaluated based on two criteria: the quality of the strategy and the minimum number of observations needed to result in an acceptable dialogue strategy. The quality of the dialogue strategy was defined to reflect the 'helpfulness' of the agent in supporting the wizard. The novice agent was the least effective in producing helpful dialogue strategies for the wizard; however, it could learn the strategy based on only 24 observations. The intermediate agent performed better than the novice agent but needed at least 864 observations to learn a consistent strategy. The advanced agent was able to guide the wizard in all the steps effectively until achieving the objective of the dialogue and needed a minimum of 2016 observations to produce a consistent strategy. The results suggest the use of the novice agent at the first stages of prototyping the dialogue system. The intermediate agent and the advanced agent could be used at later stages of the system's implementation. Future work could explore the possibilities of automating the process of migrating to more complex agents depending on the available number of observations at each moment. This would allow the application of this technology to a variety of situations where culturally situated information is needed and no initial system or little observations exist.
Abbreviations CSA: Culturally situated association; CSD: Culturally situated dialogue; WoZ: Wizard of Oz