Effects of using the first principles of instruction in a content and language integrated learning class

The aim of this study was to examine the effects of Content and Language Integrated Learning (CLIL) designed according to the First Principles of Instruction (FPI). A 15-h Japanese CLIL course was implemented. A total of 16 university students attended the course and data were collected from multiple sources, including learning tests, questionnaire feedback, and dialogues in group discussions, were collected and examined. Analysis showed that students’ learning outcomes, including basic Japanese proficiency, intercultural communication content, and writing skills, were statistically significantly improved. Students had a high level of awareness of the elements of FPI designed in the course. In addition, all the FPI elements had a positive impact on basic Japanese proficiency except for the application element: the problem-centered, application, and integration elements positively impacted intercultural communication content and writing skills. The results show that students displayed individual differences in using the worksheet to summarize their writing ideas. Students spent most of the time in the group discussions in their native language. Even when Japanese was used, individual Japanese words were used rather than whole sentences in most cases. The results of the quantitative and qualitative analyses showed that the use of problem-centered theory FPI had a positive impact on the design of the CLIL. However, attention is needed to students’ individual differences and the guidance of students in applying basic language knowledge in problem-centered learning activities. Finally, it notes points that should be considered when designing CLIL in the future.

use of the language (Dale & Tanner, 2012). The CLIL class has made some achievements in FL. CLIL is widely used in various courses in different subjects, such as mathematics, geography, and science (Dourda et al., 2014;Leal, 2016;Ouazizi, 2016). The results of these practices indicate that students' subject knowledge and language knowledge improved. Although research on CLIL practices has been ongoing since its introduction in the 1990s, its advocates understand CLIL and the varied interpretation of this approach in different ways. Coyle et al., (2010, p. 1) provide a succinct definition that refers to CLIL's specific features: "Content and Language Integrated Learning (CLIL) is a dual-focused educational approach in which an additional language is used for learning and teaching of both content and language. " According to this definition, CLIL can include a wide range of educational practices. Some studies have treated CLIL as an "educational approach" (Mehisto, 2008;Pérez-Vidal & Juan-Garau, 2010), while others have considered CLIL to be actual instructional techniques and practices used in classrooms (Ball & Lindsay, 2010;Hüttner & Rieder-Bünemann, 2010). Other scholars have considered CLIL primarily from a course perspective (Langé, 2007;Navés & Victori, 2010). Pladevall et al. (2011) point out that CLIL offers flexibility in course design and scheduling, yet the balance between language and content is complex in course design. Meyer (2010) proposes a design sheet to be used when designing CLIL courses, but he also points out that CLIL is not specialized in terms of course design theory. Furthermore, Hao and Yamada (2021) pointed out that few studies have analyzed CLIL practices from instructional design (ID) perspectives during these 20 years. ID is the framework in which teachers follow planned teaching and learning steps (Richards & Lockhart, 1994), including a wide range of fields (Reiser, 2001). ID has been shown in numerous studies to be an effective tool for improving course effectiveness (Hernandez, 2016;McGee & Reis, 2012;Richards & Lockhart, 1994). Therefore, giving consideration to ID, which has become a standard theory in the area of teaching design and syllabus (Honebein, 2019), in CLIL may provide a new perspective for CLIL improvement.
The First Principles of Instruction (FPI), one ID theory, is an integrated one (Merrill, 2002), and it is an effective solution as a principle necessary for the course. The widely endorsed FPI, developed under the influence of constructivism, outlines the requirements necessary for achieving five effective learning environment goals. There have been many empirical studies showing that students' academic performance and satisfaction with the course improve after learning through courses designed using FPI (Gardner, 2011;Lo, 2015;Tu & Snyder, 2017). To explore how to use FPI to improve CLIL course design, it is desirable to establish a pedagogical foundation for the practice of CLIL classes in terms of FPI.
In this study, we designed and implemented a course combining elements of CLIL with FPI in the context of intercultural communication in Japanese and evaluated the effects of the CLIL course designed as a formative evaluation. The purpose is to discuss the effectiveness of CLIL courses designed using FPI, derive FPI elements that positively impact CLIL based on the results, and make recommendations for future CLIL designs.

Content and language integrated learning
CLIL, proposed in Europe in the early 1990s, is defined as an educational approach used for both content and language learning and teaching (Coyle, 2007). CLIL has two central characteristics and consists of four elements (4Cs). Its two central characteristics are (1) the integration of language and learning content involving a lesson design via which teachers teach educational content in a foreign language and (2) the integration of communication and intercultural understanding to deepen students' cultural understanding through natural language communication. The 4Cs are (a) content, posing a progression in knowledge, skills, and understanding related to specific elements of a defined curriculum; (b) communication, using language to learn while learning to use language; (c) cognition, developing thinking skills that link conceptual formation, understanding, and language; and (d) culture, exposure to alternative perspectives and shared understandings, which deepen the awareness of otherness and self (Coyle, 2006). Coyle (2006) called these four elements the 4Cs and suggested that a successful CLIL class should include them all. Sabet and Sadeh (2012) pointed out that CLIL classes can improve students' confidence to use a second language and significantly improve their language proficiency and thinking ability. CLIL can promote the use of foreign language learning strategies and geographic knowledge, while at the same time improving students' reading comprehension, vocabulary, and learning satisfaction (Dourda et al., 2014). Kanamura and Miyajima (2016) implemented CLIL in Japanese language education for international students whose primary purpose is to learn Japanese law. Specifically, lessons were held as a one-year lecture series designed to support group activities. At the end of the course, students' legal knowledge and Japanese speaking skills were shown to have improved significantly.
Although the effectiveness of CLIL for content and language learning has been widely validated since its introduction in the 1990s, there is still some controversy with the design of CLIL courses. CLIL is considered a combination of foreign language learning and content-based instruction (CBI) (Cenoz et al., 2013). CBI has been defined as "the teaching of content or information in the language being learned with little or no direct or explicit effort to teach the language itself separately from the content being taught" (Richards & Rodgers, 2001, p. 204). CBI can be used to develop learners' language proficiency by providing them with meaningful content (Crandall, 1999). CBI can be effectively applied to foreign language learning in various contexts, but it is not a specific instructional theory and therefore requires a specific course design when using CBI (Heo, 2006). There is the same issue as CBI when teaching in CLIL. CLIL requires an emphasis on both content and language, but the ratio of these two aspects is not strictly required, which allows it to be used very widely, but at the expense of accuracy (Cenoz et al., 2013). Thus, the definition of CLIL allows it to be widely used in courses that include specialized knowledge learning and language learning, but it does not have a strict design theory (Pladevall et al., 2011). Filice (2020) notes that although the learning effectiveness of CLIL experiment about subject knowledge was positive during the group discussion, there were difficulties on the part of language production, especially in formulating questions and in professional terms. Filice (2020) also points out that simply combining content and language in the CLIL class may not be effective for learning content and language. Thus, further specification and design of the course are needed, which requires further experimentation with CLIL involving different designs to provide more empirical research findings. Agustín-Llach (2016), for example, conducted a three-year controlled experiment on primary school students between and 9-10 years of age. The CLIL and non-CLIL groups completed a writing task, and he then compared learners' vocabulary scores. The results showed that although the CLIL group's scores were higher than those of the non-CLIL group, there was no significant difference between the two.
Through practice, the CLIL researchers have concluded that students need more opportunities to apply what they have learned and the language in the CLIL classroom (Evnitskaya, 2014;O'Dwyer & de Boer, 2015;Yufrizal & Huzairin, 2017). One commonality among these findings is the focus on active approaches to learning. Yufrizal and Huzairin (2017) pointed out that giving students real-life problems can increase their motivation and give them more opportunities to practice what they have learned. This recommendation coincides with the general prescription of using real-world problems when teaching any complex skill (van Merrienboer, 1997).

Instructional design
Several researchers have attempted to improve CLIL from an ID perspective (Langé, 2007;Navés and Victori, 2010;Meyer, 2010). The ID theory refers to the theory that provide help and guidance for people to learn better (Reigeluth, 1999). ID and Technology contains a wide range of fields. According to the definition of Association for Educational Communication and Technology (AECT), its fields include six categories of activities or practices: (a) design, (b) development, (c) utilization or implementation, (d) management, (e) evaluation, and (f ) analysis (Seels & Richey, 1994). And the fields it contains are still expanding (Reiser, 2001).
Although Meyer (2010) designed worksheets for use in CLIL course design based on the four features (4Cs) of CLIL, he also pointed out at the same time that CLIL is not a professional course design theory, and more research is needed to explore the design of CLIL. Authors (2021) reviewed nearly 20 years of CLIL practice papers and revealed that there is little practice in CLIL practice to design courses from the perspective of ID. ID encompasses numerous theories and models and has been widely used across fields (Hernandez, 2016;McGee & Reis, 2012;Reiser, 2001;Richards & Lockhart, 1994). The use of ID for the design of CLIL may provide new ideas for the design of future CLIL.
As described in the introduction section, the basis for the proposed CLIL is constructivism (Coyle et al., 2010). In the existing ID theories, some theories have been proved to be the principles required in course design. In 2002, Merrill tried to integrate these principles and proposed the First Principles of Instruction (FPI) theory. As a strategy common to numerous ID models and theories, FPI proposed strongly under the influence of constructivism, it summarizes the five requirements necessary for realizing an effective learning environment. Also, FPI is a problem-centered theory that promotes active student learning. Building instruction around life problems, or ones that the learner will face after the class is complete, is key to the design and delivery of effective instruction (Merrill, 2002(Merrill, , 2013van Merrienboer, 1997).
Therefore, considering FPI, which is a problem-centered theory based on constructivism, in CLIL may provide a new perspective for CLIL improvement.

First principles of instruction
In 2002, Merrill proposed the FPI, which integrates many ID and learning models. As a common strategy for many ID models, the FPI, which was strongly influenced by constructivism, is a compilation of the five requirements necessary to realize an effective learning environment. Specifically, it is organized as follows: "(a) Problem-centered: Learning is promoted when learners are working to solve real-world problems. (b) Activation: Existing knowledge learning is promoted when activated as a basis for new knowledge. (c) Demonstration: Learning is promoted when new knowledge is presented to the learner. (d) Application: Learning is promoted when the learner applies new knowledge. (e) Integration: Learning is promoted when new knowledge is integrated into the learner's world" (Merrill, 2002, pp. 43-44) (Fig. 1).
FPI emphasizes the development of the course in 5 principles (Merrill, 2002(Merrill, , 2013. At the beginning of the course, the teacher will give a problem for the students to solve. Students are expected to solve the problem through learning activities that include activation principle, demonstration principle, application principle and integration principle. As students become more capable, the problems they solve become progressively more difficult. This is to ensure that the new knowledge students have learned can be applied. Gardner (2011) performs practices based on the FPI and compares an experimental group using FPI with a control group not using it; the test results of the experimental group improved significantly. Students who studied microevolution using problem-centered instruction were more confident in their ability to solve problems in the future. In addition, FPI has been used in the design of FL courses. Lo et al. (2018) developed a reversal course applying FPI and showed that students' scholastic ability in mathematics, physics, and the Chinese language improved through practice.

Research questions
This study aims to design, conduct, and evaluate FPI-based CLIL classes in order to provide more practical experience in the design of CLIL courses to improve teaching and learning. In this study, CLIL using FPI was designed and its educational effects, such as the learning performance of vocabulary and grammar, the understanding of content, and the improvement of writing ability, examined. In addition, Filice (2020) pointed out that most past research on CLIL has been analyzed from the teacher's point of view, often neglecting students' perceptions, yet student feedback is vital for future course design. This prompted us to investigate in this study how the students perceive it. Therefore, we analyze the students' perspectives as well as learning behaviors and suggest improvements for future CLIL. This study sought to answer the following research questions:  (Merrill, 2002, p. 45) RQ 1. How effective is the learning outcome of CLIL courses designed using FPI? RQ 2. How aware are students of the designed FPI elements? RQ 3. What elements of FPI have a positive impact on CLIL courses?

FPI-based CLIL
In this study, the Japanese CLIL course following the FPI element was designed based on the framework for problem-centered instruction by Merrill (2013). The process of CLIL in this study is presented in Tables 1 and 2. Some authors have demonstrated the successful application of FPI elements in a variety of settings (Gardner, 2011;Gardner et al., 2009;Lo et al., 2018;Mendenhall, 2012).
The Japanese CLIL in this study is designed following FPI elements. There were two main parts involved in this study setting procedure. First, according to the requirements of the course, the students' learning content and learning objectives were set. The learning content was intercultural communication. The learning objective was divided into CLIL's content objective, to understand the concepts of intercultural communication presented, and CLIL's language objective, to learn the basics of the Japanese language and use Japanese to write.
Next, every lesson in this study is designed according to the five elements of FPI: Problem-centered, Activation, Demonstration, Application, and Integration. demonstrates how the principles were implemented in the course for the experimental condition.

Participants
The subjects for this study were 16 third-year students in a course for Japanese language majors at a university in China. The students who participated in this course were between the ages of 20 and 21, with 12 females and 4 males. Their general Japanese language ability was at the N2-or N3-level of the Japanese Language Proficiency Test. The N2-level refers to students who "The ability to understand Japanese used in everyday situations, and in a variety of circumstances to a certain degree. " and the N3-level refers to students who "The ability to understand Japanese used in everyday situations to a certain degree. " (Japan Foundation & Japan Educational Exchanges and Services, 2012, p. 78). Before the class, the researcher introduced to the students the purpose of this study, the collection and processing methods of data, and obtained the students' approval. Classes were conducted three hours a day for five days, and the content was intercultural education. All lesson content was taught in Japanese.

Data collection
Before starting the course, participants were asked to complete a pretest in the data collection. After the class on the last day, a posttest and postquestionnaire were conducted. The post-questionnaire was used to evaluate whether the elements of FPI were recognized. The FPI questionnaire was based on the Academic Learning Time Questions (TALQ) by Frick et al. (2009). For example, the question about the center of the problem was, "I solved authentic problems or completed authentic tasks during this course"; the question about the activation was, "In this course, I was able to recall, describe, or apply Using the worksheet and discussion with the group members, explore students' own inherent impressions of the Japanese people and the reasons for such impressions Page 8 of 28 Hao et al. Asian. J. Second. Foreign. Lang. Educ. (2023)  The test consisted of three parts. The first part was the Japanese Language Proficiency Test (JLPT), including N1-and N2-level questions, which consisted of six reading comprehension questions and two listening comprehension questions. The N1 questions were worth 2 points, and the N2 questions were worth 1 point, comprising a total of 12 points. The test on content was an essay question. Students were asked to answer this question: Please define and explain the difference between stereotypes, prejudice, and discrimination. This is a question that has a correct answer. The reference answer is as follows. Stereotype is meant that a categorized image one has a group of people. Prejudice is meant that a stereotype accompanied by negative feelings (although some argue that it is not necessarily negative). Discrimination is meant that prejudice is further associated with behavior. These correct answers are explained in class. For each concept, one point is given to a description of the concept and one point is given for an explanation of the difference between the concepts. Each concept is worth 2 points, for a total of 6 points. The third part of the test required students to write about 300 words. The theme was, "Write your own thoughts on how to use the aforementioned theories when communicating cross-culturally. " For the evaluation standard, we referred to the "JF Japanese Education Standard" (JF standard for Japanese-language education, 2010) and made an evaluation standard that matched the Japanese ability of the students. The evaluation was based on three items: "content, " "grammar/vocabulary, " and "composition. " The maximum score for each evaluation item was 4, the minimum score was 1, and thus, the maximum total score was 12. For example, when grading content, a score of 4 is given if the main content is explained in detail so that the reader can understand the central idea. A score of 3 is given for lack of explanation and inability to fully understand the content. A score of 2 is given if the reader can vaguely understand the subject matter. 1 point for lack of explanation that makes content difficult to understand. When grammar/vocabulary are scored, a score of 4 is given if the grammar and expressions of words related to the topic are accurately applied. A score of 3 is given if there are some word grammatical errors but the sentence can be understood. A score of 2 is given if some information is not conveyed due to grammatical errors. A score of 1 is given if the sentence is incoherent and does not convey the message due to grammatical errors. When grading the composition of the constitution, a score of 4 is given for paragraphs and sentences that are connected with appropriate words and phrases and have a clear paragraph composition. A score of 3 was given for having some sentences that were difficult to understand in relation to each other. A score of 2 was given for a simple list of ideas with no connection. A score of 1 was given for fragmented sentences and words with no sentence formation. Specific evaluation criteria are detailed in Appendix 1.
In order to increase the opportunities for students to use what they have learned in class, teacher assigns tasks to students based on class content. Students are called upon to solve the tasks through group discussions. Activity 1 is a learning activity of self-disclosure. Activity 2 is a learning activity about stereotype and prejudice and discrimination. Details about the 2 activities are described in Table 2. In order to capture the relationship among the 4Cs of CLIL that the students focused on when solving the problem, and learning outcomes, and the 5 principles of the FPI designed, the students' group discussions were recorded and coded for analysis.
The worksheets that the students created during the activity are shown in Figs. 2 and 3.

Data analysis methods
First, the data were verified as having a normal distribution using the Statistical Package for Social Science (SPSS) version 26 software program. Since the data were not normally distributed, the Wilcoxon signed-rank sum test was used to analyze the pretest and posttest, and Spearman's rank correlation coefficient method was used to analyze students' learning outcomes. In addition to the mean, the median is also stated in the result. The total score of the basic test was 12, the content test was 6, and 12 for writing. The mean and standard deviation of students' awareness of the designed FPI elements were analyzed. On the other hand, the qualitative data which were collected through group discussion were analyzed through MAXQDA 2020. Page 10 of 28 Hao et al. Asian. J. Second. Foreign. Lang. Educ. (2023) 8:2 Result and discussion

Research question 1: how effective is the learning outcome of CLIL courses designed using FPI?
Wilcoxon signed-rank test was used to measure the significance of changes in the prepost learning outcomes. The results are shown in Tables 3, 4 and 6. As the results shown, the students' learning outcomes have been effectively improved through this CLIL course. Table 3 indicates that the students' basic Japanese proficiency test mean value improved from 6.13 (out of a full score of 12, SD = 2.09) to 10.00 (SD = 1.73). Additionally, a significant difference (p = 0.001) was noted in the distribution of basic Japanese scores. From this result, we can deduce that the students' basic Japanese knowledge improved during this CLIL course. Table 4 indicates that the students' content knowledge test mean value improved from 2.00 (out of a full score of 6, SD = 1.51) to 4.62 (SD = 1.50). Additionally, a significant difference (p = 0.001) was noted in the distribution of content knowledge scores. The content question is to explain the following 3 concepts; stereotype, prejudice and discrimination. These concepts have corrected answers which have presented to the students in class. The scoring criteria were determined after discussions with Japanese teachers based on the JF Standard. Each concept is worth 2 points, for a total of 6 points. The minimum score is 0 and the maximum score is 6. The example of student pre-post answer on the learning content section is illustrated in Table 5. From this result, we can understand that the students' content knowledge improved during this CLIL course. Table 6 indicates that students' writing test mean value improved from 4.19 (out of a full score of 12, SD = 1.00) to 9.38 (SD = 1.64). Additionally, a significant difference (p < 0.001) was noted in the distribution of content knowledge scores. The students' writing was graded in 3 dimensions: Content, Vocabulary/grammar, and Constitution, according to the <Japan Foundation standard for Japanese-language education 2010>. Each dimension is scored on a scale of 1-4. Therefore, the maximum score is 12 and Page 11 of 28 Hao et al. Asian. J. Second. Foreign. Lang. Educ. (2023) 8:2 the minimum score is 3. The example of pre-post answer of writing test is illustrated in Table 7. From this result, we can understand that the students' writing skills improved during this CLIL course.
In sum, it can be understood that the CLIL course designed by FPI has positive effects on language knowledge, writing skills, and content knowledge. In line with this research finding, previous studies also showed that a course designed using FPI has positive impacts on students' learning outcomes. Lo et al. (2018) found that by using FPI in a flipped course, students develop their scholastic ability and Chinese language ability, and learn how to generate and organize ideas.
Because CLIL is characterized by using a foreign language to teach specialized knowledge, the goal is to improve both the foreign language and the specialized knowledge, which Coyle et al. (2010) notes not easily accomplished. Ennis (2015) also notes that CLIL courses require explaining what students do not know in a language they do not understand, which is a very challenging task. In addition, Agustín-Llach (2016) points out that there were no significant differences in student achievement between the CLIL and non-CLIL courses. It can thus be seen that if the language and content are combined, there is a possibility that the desired learning goals will not be achieved. However, this CLIL course designed using FPI guarantees both content and language learning outcomes.
In addition, the writing test result proves that the approach of having students use worksheets to organize their thoughts in this course is effective. For example, the worksheet used in Activity 2 was to help the students present a problem in 3 stages. This helped students better understand the three-paragraph writing method. It was also evident from the example that the students' essays were composed of introduction-firstsecond-summary. In fact, most students used this composition in writing test at this time. The teacher did not emphasize this writing format to the students during the lesson. It can be seen that the students did a good job of internalizing this form of composition on their own through this course. However, regarding the SD in Table 6, the post-SD (1.64) was also higher than the pre-SD (1.00). Students' writing skills were strengthened  Page 12 of 28 Hao et al. Asian. J. Second. Foreign. Lang. Educ. (2023) 8:2 through this lesson, but the disparity between students also widened. Although the design for writing is effective for the whole course, the degree of understanding of writing content may differ due to individual differences. Allison et al. (1998) point out that higher education students undertake limited writing exercises in the classroom and  require one-to-one writing instruction. In this course, although the teacher provided a worksheet to help students organize their writing ideas before they wrote, the teacher did not show students specific examples of writing. This may have resulted in students not understanding the use of the worksheet, thus causing an increase in personal differences due to their own comprehension. This is also expressed in the dialogue. There is an old Chinese proverb that says, "When I meet a crony, I'd like to drink tons of wine with him, but if there is a nuisance, I'll be reluctant to say just a few words. " The meaning of this saying is that if you get to know each other, there are many things you can talk about inseparably, but if you don't talk to each other, there will be many half-words. In other words, it refers to values. According to that proverb, I used to think that there was no need to be friends with someone whose values and orientation did not match mine, but I have now changed my mind. I now understand that it is because of the compassion of the Japanese people that we should be inclusive Next comes prejudice and discrimination. Some argue that prejudice is a stereotype accompanied by negative feelings, but not necessarily negative. Discrimination is prejudice further coupled with behavior. In other words, prejudice is people's awareness and discrimination is their behavior due to prejudice. We make the right decisions so that we develop a proper attitude to see people and things in solidarity through our own eyes and not through stereotypes Finally, there is image. Among the four students in Group 1, one student was not clear about the use of the worksheet. Moreover, this student asked how to use the worksheet two times during the discussion. The student who asked the question finally completed the worksheet after constantly checking with the other group members. This is consistent with the results in Table 6. Students experienced some differences in understanding when using the worksheet.

G2-S2: This should be written in the part that you know that others do not know, right? Is it written in position 3?
G2-S1: Yes, here, you know, but others do not know. One student in Group 2 also had a question about using the worksheet. The group members gave clear answers. Again, this group's discussion is consistent with the results in Table 4 in that there are individual differences among students.
Group 3 G3-S4: Been on a diet. (laughing) G3-S3: But you are not really doing the exercise to lose weight either. G3-S1: Yeah, I did not. G3-S3: Been studying English. G3-S1: Already written. Page 15 of 28 Hao et al. Asian. J. Second. Foreign. Lang. Educ. (2023) 8:2 (8 seconds) G3-S3: OK. Once again, there is an impasse. It can be seen from the conversation that one student in Group 3 also raised questions about using the worksheet. However, the problem raised was not solved in the end. The discussion in Group 3 also showed consistency with the results in Table 6. In particular, asking questions that are not answered can exacerbate personal differences among students.

G4-S1: The second position is to write what others know and what I do not know, right?
(5 seconds) G4-S3: Yeah, the second position is what others know and what you do not know yourself.
G4-S1: This is the Johari window in … others know that I do not know. OK, then a little more specific is… G4-S4: It is your interest and what you like to do, and so on. G4-S1: Like habits! It is one of those little habits that you know but others do not. And then there is the kind of mantra you have, you do not realize it, but others do.
G4-S4: Yeah, verbal words must be known to others because you always say them. G4-S2: Oh, I see, some little gestures and habits! G4-S1: Yeah, Bad habits are something that others know about you, but you do not know about yourself.
G4-S4: Okay, so write about some bad habits in this position. A student in Group 4 also asked a question about the worksheet. The group members responded positively, and the student who asked the question eventually solved the problem. It is also evident from the conversations in Group 4 that there were personal differences among the students, consistent with the results in Table 6.
From the conversations, it can be seen that all four groups of learners had problems with not knowing how to use the worksheet or what kind of content should be recorded in which position in the worksheet. Furthermore, it can be seen that each group addressed the issues raised by the members in different ways. Groups 1, 2, and 4 all solved this problem through discussion, but Group 3 did not solve these problems in the end. This is consistent with the results in Table 6. Students experienced some differences in understanding when using the worksheet, and problems may arise that cannot be resolved in the group, as demonstrated by the conversations in Group 3. This can also cause individual differences between students to grow. Students who already understand how to use worksheets can organize their writing ideas better through the use of the worksheet, while students who do not understand and cannot get answers from the group members become more confused. When designing a course to solve this problem, it is necessary to understand the learning situation of each student and provide support to the individual. In addition, the teacher should evaluate the ease of use of the worksheets before the formal course to improve the effectiveness of the worksheets.
Through this course, the students have a stronger awareness of the FPI principles followed in the design of the course. Table 8 indicates the students' awareness of the designed FPI elements. In the results for the questionnaire items for the awareness of FPI elements, all elements received high ratings of more than 4 points, showing that the course was designed to allow students to experience the FPI elements. The highest score is integration element (mean = 4.53, SD = 0.27), followed by activation element (mean = 4.38, SD = 0.32), demonstration element (mean = 4.33, SD = 0.58), and application element (mean = 4.33, SD = 0.27), and the lowest score is for problem-centered element (mean = 4.09, SD = 0.55).
Notably, students did not show much variation in their awareness of the elements in Table 8. However, as described in the course design section, the length of the course's application element was increased to provide students with more active learning opportunities. The application element accounted for 33% of the course time, while the integration element accounted for 22% of the course time. While the integration element received the highest score, the application element scored relatively low. Borg (2006) points out that effective language teaching requires rich classroom diversity and an environment where students can increase their participation and the quality of their involvement. This course was designed to ensure that students participated in

Fig. 4 Proportion of statements related to the 4Cs in group conversations
Page 17 of 28 Hao et al. Asian. J. Second. Foreign. Lang. Educ. (2023) 8:2 the activities, and the dialogue content during group discussions was analyzed to capture the quality of learning during participation time. The conversations related to the course content and Japanese language learning in the activity were coded. Of the 983 dialogues, 59 were related to lesson content and 155 were related to Japanese language learning. The students' conversations were coded into categories according to CLIL's definition of the 4Cs. The coding criteria were edited by first author in CLIL based on CLIL's 4C's principles. Then, the coding criteria were discussed by experts in Japanese language education and experts in ID (co-authors). After the coding criteria were unified, the students' conversations were coded. In addition to the statements related to content and Japanese language learning, there were also statements about cognition and culture. Content was defined as explanations and discussions about what was being learned. For example, in Activity 2, students used a worksheet to summarize their changing impressions of Japanese people, so words related to impressions were judged to be related to learning content in this activity. The definition of communication is that the target language is used, or that discussion of the words and grammar of the target language. The definition of cognition is the ability to summarize what has been learned in one's own words. Or students can give their own opinion based on what they have learned by thinking about themselves. So statements about expressing one's own ideas are considered to be related to cognition. Culture is defined as collaborative work between groups or thinking about internationalization. Therefore, statements that help promote group collaboration, or statements about cultural exchange, are considered as culture in this study.
In the 2 group activities, there were 983 conversations, of which 565 were related to 4Cs. 155 were related to communication. 40 were related to cognition. 311 were related to culture. 59 were related to content (Fig. 4).
Example 2. Group discussion about contents and Japanese. Group 1 G1-S1: I wrote "同じ"(same). I think it's all the same. G1-S4: The teacher is asking you when. G1-S3: You see, what the teacher's saying is originally you thought the Japanese were like this, and then it changed. G1-S4: Yeah, that means when. G1-S2: どんなときですか (When is it?) G1-S1: No change. It's the same as I thought it would be. G1-S3: No, the teacher asked when you had changed. G1-S1: That is, the teacher was asking when it had changed, but I don't think it has. G1-S4: That is when it has or has not changed. G1-S1: It hasn't changed. G1-S4: You just write about one thing that inspired you. G1-S1: I am now writing about "同じ"(same).
(Writing and painting) Group 1 had 443 conversations in the two learning activities, including 19 conversations that included knowledge content and 30 conversations about Japanese. Moreover, as demonstrated in this dialogue, most of the Japanese used consisted of single words, and even when the sentences were in Japanese, they repeated the topics of the questions presented. These conversations corroborate the results in Table 2. Students in Group 1 mostly use Japanese to confirm basic words, which can explain that the scores on the basic Japanese test have improved.
Group 2 Scene 1 G2-S2: What about this one? G2-S3: Let's write another one G2-S1: The third is to ask, at first you think so, and then after contact with them, what has changed, right?
G2-S2: Yeah. Just think of them for "真面目" (Seriously). G2-S1: どんなときで(At what time). At what time was it? OK, then next. G2-S2: Can this space be written down? (5 seconds) G2-S2: Just write keywords? G2-S1: Yeah. G2-S3: Right. Scene 2 G2-S4: Eh, "見た", is there a small "つ"? G2-S3: No, there is no "つ. " Group 2 had a total of 582 conversations during the two learning activities. Of these, 26 conversations were related to the knowledge content, and 77 conversations were related to Japanese. As shown in the example given for Group 2, as in Group 1, most of the conversations in Japanese consisted of sentences confirming the question topic, and most of the other conversations in Japanese were in words. However, there were many instances of confirmation and discussion of Japanese words in Group 2. These conversations corroborate the results in Tables 2 and 5. Although the students did not have the majority of conversations about knowledge content and Japanese in the group discussions during the application time, it was meaningful to confirm and consolidate the words learned during the group discussions.
G3-S1: This is OK. G3-S2: What about the specific things? Like they do something that makes you think they are like clouds.
G3-S3: That is, the characteristic is 優しい (gentle). Then they are polite. Specifically, the first impression was obtained from the Japanese drama. Then actual contact with the Japanese, the impression has not changed. なし(none).
G3-S2: OK. Group 3 had a total of 347 conversations in the two learning activities. Eight of these were about knowledge content and 34 about Japanese. As with the conversations in Groups 1 and 2, most of the Japanese that appeared consisted of single words rather than complete sentences. This is consistent with the results in Table 2 that students' scores on the basic Japanese test improved significantly.
Group 4 Scene 1 G4-S1: 最近読んだ本は何ですか? ( In this CLIL class, considering the students' Japanese language level, they were not required to use Japanese to communicate in the group discussions. However, the students in Group 4 would still try to communicate in Japanese independently. This could explain the results in Table 2 that the students' Japanese language scores improved in a meaningful way.
From the conversations in the four groups, it can be seen that the students did not spend all their time discussing the knowledge content and language learning during the application time. Some of the conversations were about how to complete and confirm information about the worksheet. Although the results in Tables 2 and 3 show a significant improvement in Japanese language scores and course content scores, it is also clear from the group discussions that students spent limited time on course content discussions and the use of Japanese. Furthermore, the use of Japanese during group discussions was mostly focused on the confirmation of the problem topics and Japanese words. Only the members of Group 4 tried to communicate entirely in Japanese during the discussion. Therefore, in future CLIL design, attention should also be paid to supporting group discussions. Make group members more active and interactive. Allow students to discuss the course content more.

Research question 3: what elements of FPI have a positive impact on CLIL courses?
In order to determine factors that might affect the cultivation of learning outcomes through CLIL, Spearman's Rank Correlation Coefficient was used to analyze the correlations among learning outcomes (pre-post basic test score, pre-post content test score, pre-post writing test score), 5 principles of FPI (post FPI Questionnaire). The calculation is reproduced below. First, the pre-post difference in basic Japanese proficiency test score, content knowledge test score, writing skills test score, were calculated. Then, the sum of the score of the questionnaires under the each of the 5 principles in the FPI questionnaire was calculated. Finally, because of the small sample size, the Spearman's Rank correlation was used to calculate the correlation between the difference between the score of basic, content, and writing and the total score for each of the 5 principles.
The results in Table 9 show that there is a significant correlation between the problemcentered principle and the integration principle with all learning outcomes; there is a significant correlation between the activation principle and the demonstration principle only with the learning outcomes of basic knowledge; and there is a significant correlation between the application principle with the learning outcomes of content knowledge and writing.
According to the correlation results in Table 9, some elements of FPI awareness reached a significant level of correlation with learning outcomes.
Correlations among the variables are presented in Table 9. The problem-centered element has a moderate positive correlation with the basic Japanese proficiency test score (0.554**, p < 0.01), a strong positive correlation with the content test score (0.602*, p < 0.05), and a strong positive correlation with the writing test score (0.813**, p < 0.01). As described in the course design phase, this course prepared a life-related problem for each class and had student peers learn the course content to solve the problem. The significant positive correlation that emerged between the problem-centered elements and the learning outcomes in this course is consistent with the findings noted by Gardner (2011).
The activation element has a strong positive correlation between basic Japanese proficiency test scores (0.605*, p < 0.05) and no significant correlation between content test scores and writing test scores. It is presumed that there is no obvious storyline that appears in each lesson throughout the course, which may be the cause of this result. For example, the problem in the first lesson was "How do you and your friends self-explain to each other?" while the problem in the second lesson was, "What is your impression of Japanese people in real life, and how can you make better friends with foreigners?" Such a question setting may cause students to have a weak sense of the connection between the lessons, leaving them unable to clearly appreciate the connection between what they learned in the previous and the current lessons. Merrill (2020) states that courses should be designed so that students solve problems relevant to real life and have strong connections between problems to evoke better what students have already learned and (Writing and painting) Group 4 had 195 conversations in the two learning activities, including 14 conversations in Japanese (7.18%). As in the other three groups, most of the Japanese that appeared in the conversations in Group 4 were words. This is consistent with the results shown in Table 9. Although the usage and pronunciation of words were discussed during the activities, overall, the use of Japanese was the least frequent in Group 4.
As the group discussion examples show, the students used their first language most of the time. Moreover, almost all of the Japanese appeared as single words, not as fully expressed sentences. Therefore, students actually used Japanese less than other FPI elements under the application element during class time. Macaro et al. (2020) point out that when students learn complex knowledge, using their foreign language for discussion may cause a heavy cognitive burden. Therefore, this course is designed to allow students to use their native language in their discussions. However, Lo (2015) also points out that excessive use of a native language may pose an obstacle to learners' target language learning. Thus, in future course design, the use of native and foreign languages should be appropriately balanced. In addition, when using FPI for CLIL design, it is necessary not only to ensure that students have enough time for the learning activities but also to improve the quality of the activities.
The integration element has a strong positive correlation with the basic Japanese proficiency test score (0.641**, p < 0.01), a moderate positive correlation with the content test score (0.581*, p < 0.05), and a strong positive correlation with the writing test score (0.624**, p < 0.01). The correlation results are consistent with the questionnaire results about FPI awareness. Students had a high awareness of the integration elements in the course, and this awareness had a positive impact on students' basic Japanese language knowledge, writing skills, and content knowledge. Merrill (2013) points out that the integration element includes reflection, discussion, and creation. As described in the course design, this course gave students the opportunity to reflect on themselves through group discussions, present what they had learned, and think about how they could creatively apply what they had learned in their future lives. The results show that all these designs have had a positive impact on students' learning outcomes.

Conclusion and implementation
This study applied FPI elements to design a CLIL course to improve the Japanese language knowledge, intercultural communication content knowledge, and writing skills of 16 university students.
First, the test includes basic Japanese proficiency, intercultural communication content, and writing skills, which were administered to assess students' learning outcomes before and after the course. The results indicated a significant improvement in all three domains. However, some personal differences in writing emerged. The analysis of the group conversations showed that the students showed some confusion in the use of the worksheets. Furthermore, conversations about how to improve writing skills were not addressed at all in the discussion. This indicates that instructions on using worksheets and guiding students to learn from each other during group discussions are essential.
Second, to clarify the FPI elements that affect learning outcomes, an FPI questionnaire was conducted to assess students' awareness of FPI, then the relationships between FPI element awareness and learning outcomes were assessed. The results indicate that all FPI elements had a positive impact on basic Japanese proficiency except for the application element; the problem-centered, application, and integration elements had a positive impact on intercultural communication content and writing skills. FPI as a problemcentered theory was used positively in the design of the CLIL course. In particular, there was a significant positive correlation between the problem-centered elements and the content and language of CLIL in this course. This confirms that the problem-centered course design approach is effective for CLIL, as mentioned in the hypothesis. However, it is also clear from the results that the application elements of FPI did not correlate significantly with the basic language test of CLIL. Therefore, care needs to be taken when designing future CLIL courses using problem-centered theory, and students should also be guided to apply the fundamentals in the activities. Therefore, when designing future CLIL courses using problem-centered theory, students need more guidance in applying the basic language knowledge in the activities.
However, there are some limitations to this study. One limitation is that the number of participants is relatively small and the course length is relatively short, and it has not been tested for many disciplines, which impairs the generalizability of the results. Another limitation is the data type for qualitative analysis. Although this study analyzed the students' group discussions, the students' learning behaviors could not be revealed from the conversations. A better understanding of students' learning behaviors in CLIL courses is needed in order to be able to improve the CLIL course design in a more targeted manner. Finally, because this was a practical course and the students were on a tight schedule, the students were not scheduled for a delayed test in this study. In future course design and experiment, consideration can be given to implementing delay test for learning content that can be affected by short term memory, such as learning content. To better verify the course effect.
Still, the findings of this study can provide some ideas for CLIL course design when teaching Japanese learners of the same level, indicating that the study's findings would have been more representative and convincing if a greater number of participants had been included, more time had been available, and more topics for the course had been used in the study. In addition, this CLIL course is not an exclusively target language learning environment. This course allows students to use their native language during group discussions. In future CLIL course, worksheets can be designed based on the principles of FPI to support CLIL that are entirely in the target language learning environment. As a result, such future investigation will contribute to CLIL course design. The methodology of this design can be applied to other topics in future CLIL courses to obtain more data for a more extensive validation of the feasibility of this course design. Furthermore, future research should also focus on the variability of students' individual opinions, thereby improving the quality of the design. The four elements of Content and Language Integrated Learning