One size fits all? The role of task complexity in L2 production via the audio chat

The pervasive use of information and computer technology in second or foreign language learning has led researchers to explore the ideal tasks for technological environments to facilitate second language (L2) learning. This study intended to contribute new knowledge to this area by examining the effects of the task complexity manipulated along the variable +-few elements in Robinson’s Cognition Hypothesis on L2 production of 42 lower intermediate Chinese EFL (English as a Foreign Language) learners who completed two interactive tasks (simple versus complex) in dyads via audio chat of the video-conferencing platform WeMeet in a laboratory setting. Participants were also instructed to measure the difficulty of the tasks by responding to a self-rating questionnaire immediately after they completed each task. Their L2 output in the two tasks were recorded, transcribed and coded in three dimensions namely, syntactic complexity, lexical complexity and accuracy. SPSS 26 was used for statistical analyses. The results revealed that increasing task complexity induced significantly more lexically complex language. However, it did not result in significant changes in terms of syntactic complexity or accuracy of learners’ L2 output via audio chat. These results contradicted the predictions of the Cognition Hypothesis, suggesting the inapplicability of Cognition Hypothesis in audio chat.


Introduction
The evolution of information and computer technology has profoundly influenced the way people communicate.Besides, it has had substantial impact on literacy and education over the past years (e.g., Assi & Rashtchi, 2022;Bagheri & Mohamadi Zenouzagh, 2021;Jack & Higgins, 2019;Lafford & Lafford, 2013;Li, 2022;Liu, 2018;Niño, 2020;Pourdana, 2022).Great attention has been received on how computer technology, in particular, computer-mediated communication (CMC) can be best used to support language learners.CMC refers to any interpersonal exchanges via the medium of computers (Herring, 1996).Asynchronous CMC (ACMC) which occurs with delayed interaction (e.g., emails) or synchronous CMC (SCMC) which takes place in real time (e.g., text, audio and video chat) are the two forms of CMC (Yilmaz, 2011).
How to design online tasks that can best work for L2 learners has become an area of critical importance in computer-mediated L2 learning research.For example, Peterson (2010) proposed the need of research in second language acquisition (SLA) that explored appropriate task designs to maximize the potential of interaction via the online medium.Inspired by Robinson's (2001) Cognition Hypothesis, which offers a framework about task designs for traditional oral face-to-face (FTF) contexts by intentionally manipulating the cognitive complexity of the task, this study intended to provide new insights in task designs for technological settings by examining the effects of task complexity on L2 production.
A substantial body of studies have been motivated by this model, making tremendous contributions to task and syllabus design in face-to-face (FTF) and pen-and-paper writing contexts (Cho, 2018;Vasylets, 2017;Kim & Payant, 2017;Awwad et al., 2017;Rahimi & Zhang, 2017;Abrams, 2019;Abdi Tabari, 2020;Lee, 2020;Luo, 2022;Kim, 2020;Li et al., 2023;Liang & Xie, 2023;Abdi Tabari et al., 2023;Xu et al., 2023).Despite the fact that the Cognition Hypothesis has been largely confirmed in FTF environments, several pioneer studies investigating Robinson's model in text-based SCMC settings indicated that this theory may not easily transfer to computer-mediated tasks and contexts (Adams & Nik, 2014;Adams et al., 2015;Baralt, 2013;Nik, 2010).Researchers attributed these results to the affordances of text chat.For instance, Adams and Nik (2014) suggested that the features of text-based SCMC, such as the planning and editing opportunities may have freed learners' attentional resources to focus on linguistic forms, hinting that task complexity was mediated by these characteristics of text chat.Furthermore, Adams et al. (2015) who found a trade-off effect between accuracy and complexity (which is against the Cognition Hypothesis) argued that while the visual transcript of the communication in text chat may enable learners to be conscious of their own writing, the speed of text exchange in text chat may also impede them from attending to linguistic complexity compared to pen-and-paper collaborative writing.
Nonetheless, these studies have been conducted merely in text-based SCMC, with a neglect of employment of other SCMC modes (e.g., audio-based SCMC).Different than FTF communication, audio chat affords no visual cues.As a result, learners can by no means depend on paralinguistic communication (e.g., facial expressions or body languages) for understanding or conveyance of meaning.Instead, they have to rely on verbal explanation (Yanguas, 2010) to ensure the proper delivery of the intended message.Additionally, it necessitates more real-time communication, which limits the possibilities for online planning compared to the text-based SCMC.Therefore, it would be an intriguing question to find out the extent the Cognition Hypothesis holds true in audiobased SCMC, a gap also pointed out by Smith and González-Lloret (2020).
Furthermore, the field of SCMC studies has a recognized limitation resulting from privileging text-based SCMC over audio or video-based modes (Smith & González-Lloret, 2020;Ziegler, 2016).The neglect of employment of audio or video-based modes is problematic given that with different features than text-based mode, they also deserve to be studied (Ziegler, 2016).Besides, audio and video modes may assist in diversifying applications of computer technologies in L2 instruction and make L2 learning more comprehensive at the same time (Lamy & Hampel, 2007).
In accordance with these ideas, this study initiated the attempt to test the validity of Cognition Hypothesis in audio-based SCMC context for the purpose of responding to the call for more mode-balanced task-based SCMC agendas, providing a better understanding of the role of task complexity in SCMC contexts as well as shedding some light on the design of tasks for technological environments.

Cognition hypothesis
Framed within a cognitive approach to second language acquisition (SLA), this study was designed under the guidance of Robinson's (2001) Cognition Hypothesis.Robinson (2001) defined task complexity as "the result of the attentional, memory, reasoning, and other information-processing demands imposed by the task structure on the language learner" (p.29).According to Robinson (2001), intentional manipulation of task complexity could engage learners in specific patterns of mental processing and language use that facilitate acquisition.Robinson (2001) distinguished between two dimensions of task complexity in the Cognition Hypothesis, namely, resource-directing and resourcedispersing which were assumed to influence language production in different ways.Robinson (2001) predicted that increasing cognitive task demands along the resourcedirecting variables (e.g., +-here and now, +-reasoning demands, +-few elements) in monologic tasks (where participants perform the tasks on their own without any exchanges with others) directs learners' attention to the linguistic forms to meet the task's conceptual or functional demands, thus inducing higher syntactic complexity, lexical complexity and accuracy.However, Robinson (2001) argued that the more complex interactive tasks (where participants perform the tasks in groups of two or more by interacting with each other) along this dimension would lower the syntactic complexity of L2 production because of the multiple one-word or phrasal responses during interlocutors' negotiations.
On the other hand, Robinson (2001) predicted that increasing task complexity along resource-dispersing variables (e.g., +-planning, +-prior knowledge, and +-single task) would lead to decreased syntactic complexity, lexical complexity and accuracy in both monologic and interactive tasks in that learners' attention would be diverted away from the linguistic forms to the important aspects of accomplishing the task.
To the best of our knowledge, the previous studies investigating effects of task complexity in the SCMC contexts have only examined certain resource-dispersing factors (e.g., +-task structure; +-prior knowledge) (Adams & Nik, 2014;Adams et al., 2015;Nik, 2010) and the resource-directing factor +-reasoning demands (Baralt, 2013).For this reason, the resource-directing factor +-few elements which had never been examined in SCMC settings was examined in this study in response to the call of Smith and González-Lloret (2020) that to truly test the viability of Robinson's Cognition Hypothesis in a technological environment, most if not all of these variables should be explored empirically.

+ -Few elements
According to the Cognition Hypothesis, a simple task consists of fewer elements whereas a complex task comprises more elements.Due to the under specification of which component of a task should be regarded as elements in the Cognition Hypothesis, the previous studies examining this factor in FTF contexts operationalized elements in a variety of ways.
For example, Robinson (2001) used two interactive map tasks where participants were instructed to provide directions to their partner.The elements took the form of city landmarks in the maps.While the simple map consisted of fewer readily distinguishable landmarks, the complex map had more landmarks which were difficult to distinguish.Kuiken et al. (2005) operationalized the factor as the number of criteria to take into account in parallel (three in the simple task versus six in the complex task) when choosing a holiday resort.In the study by Michel et al. (2007), elements was manipulated as the number of options to choose from.Participants were asked to give advice on which electronic device to buy from two options in the simple task and six in the complex task.Révész (2011) employed fund-assigning tasks where participants debated on how much money they would distribute to a project.The two tasks differed not only in the number of programs but also in the total sum of money to be allocated ($50, 000.00 to three projects in the simple task versus $100, 000.00 among six programs).Kim (2020) adopted picture narration tasks which differed both in the number of characters and in the complexity of reasoning for the event (two main characters and simple reasoning for the event versus three main characters and two minor characters as well as relatively complex reasoning for the event).In Xu et al. (2023), participants were instructed to pair the best roommates.Elements were operationalized as the number of candidates and the number of characteristics of the candidates (four candidates each of whom had four characteristics in the simple task versus six candidates each of whom had six characteristics in the complex task).
This study followed the operationalization of Kuiken et al. (2005) with elements manipulated as the number of criteria to consider because it was assumed to impact learners' cognitive processing activities at the stage of conceptualization (see Levelt, 1989 for a detailed description of oral production model), which would ultimately affect L2 production at the stage of linguistic formulation (Kormos & Trebits, 2012).
Based on the above discussion, the present study aimed to examine the impact of task complexity manipulated along the resource-directing factor +-few elements on learners' L2 production via the audio-based SCMC mode.To achieve this objective, the following research question was formulated: Research Question: How does task complexity manipulated along the resource-directing factor +-few elements affect L2 production of Chinese EFL learners performing interactive tasks via audio-based SCMC in terms of syntactic complexity, lexical complexity and accuracy?

Participants
Forty-two third-year first language (L1) Chinese undergraduates who learnt English as a foreign language from a university in Northwest China participated in this study.They were male (n = 5) and female (n = 37) adults whose ages ranged from 20 to 23.By the time the study was conducted in semester 1, academic year 2022-2023, they had learned English for about twelve years and none of them had ever been to or lived in English-speaking countries.Their English proficiency level was identified by the V_ YesNo vocabulary test (Meara & Miralpeix, 2016) which simply asks whether participants understand the meaning of a lexical item or not.Vocabulary size has been proven to be highly correlated with proficiency (Milton, 2009) and many earlier works (Gilabert et al., 2009;Vasylets et al., 2017) have used similar vocabulary tests (e.g., X_Lex and Y_ Lex) to estimate learners' L2 proficiency.Based on the mean score of our participants on this test (M = 4519.95,SD = 558.123,,they can be classified as lower intermediate learners as the manual of this test suggests that intermediate level learners have a vocabulary size of 3500-6000 (Meara & Miralpeix, 2015).

Instruments
The instruments employed in the study included the audio chat of the video-conferencing platform WeMeet and two interactive decision-making tasks which were elaborated on as follows.

Equipment
The software used in the study was a video-conferencing platform called WeMeet (similar to Webex) which could be easily downloaded and installed to a computer or a mobile phone.It is a versatile software which combines features of text, audio and video communication, document file sharing, grouping, and recording among others, which totally serves the purpose of the study.Furthermore, it is the simplest, user-friendly and free online communication tool which was widely used in China for online teaching and learning during the pandemic of Covid-19 in 2020, hence participants in this study were quite familiar with it and could operate on it without any trouble.The task prompts were uploaded to WeMeet through its feature of "document file sharing" which enabled the participants to refer to them when necessary.After being divided into dyads via the feature of "random grouping", the participants were instructed to complete the tasks with their partner via the audio chat of WeMeet in their private meeting room and they were also asked to record their performances via the feature of "recording".

Tasks
Two interactive decision-making tasks (simple versus complex) on similar topics were employed.Task complexity was manipulated along the resource-directing variable +-few elements.The tasks used in the current study were adapted from those used in Kuiken and Vedder (2007) and Mahpul and Oliver (2018).In the simple task, participants had to take three criteria into account (e.g., close to the university, two bedrooms, and attractive price) when making a choice as to which one out of five apartments to rent (see Additional file 1: Appendix 1 Simple Task Materials), whereas in the complex task, they had to consider six criteria (e.g., having a garden, serene surroundings, close to the city center, availability of physical exercise facilities, swimming pool, and breakfast provided) before deciding which one out of five hotels to choose (see Additional file 2: Appendix 2 Complex Task Materials).The task materials were given in Chinese to prevent participants from borrowing lexis from the prompts.
As the researchers have asserted the importance of providing the validity evidence for the operationalization of task conditions in task complexity studies (Rahimi, 2019), a ten-item Likert scale self-rating questionnaire adapted from Robinson's (2001) original questionnaire was used to test whether the complex task was perceived more cognitively demanding than the simple task as assumed.Cronbach's alpha was used to test the reliability of the questionnaire, and the result showed a value of 0.82, indicating a very high level of internal consistency.
As shown in Table 1, the participants' ratings of task difficulty for the complex task (M = 3.42, SD = 0.22) were higher than those for the simple task (M = 3.18, SD = 0.35).A paired samples t-test revealed that the difference in the perceived difficulty of the two tasks was statistically significant (t (41) = 4.01, p < 0.001) with a medium effect size (d = 0.63), indicating that the complex task was indeed more cognitively demanding than the simple task.Therefore, the task manipulation in the present study was proven to be valid.

Procedure
This study employed a cross-sectional repeated measures design, which means each participant is measured more than once at one particular time instead of tracking language acquisition or development over time (Cohen et al., 2007).Participants (n = 42) were randomly divided into dyads to consecutively perform two tasks (one simple and one complex) via the audio chat of WeMeet in a laboratory setting.Following Cho (2018), participants were allowed only three-minute pre-task planning time to read the task prompts and make their decision before the experiment started.They were allowed seven minutes to complete each task.The time allotment was determined by a pilot study.Immediately after they completed each task, they were instructed to complete the self-rating questionnaire to measure the cognitive load of the task.To rule out the carryover effects, the tasks were counterbalanced.Specifically, half participants performed the simple task prior to the complex task, whereas the other half performed the tasks in the reverse sequence.We provided step-by-step directions (in Chinese) for participants to perform the tasks (see Additional file 3: Appendix 3 Task Direction Sheet).
WeMeet enabled each dyad to communicate in their own private meeting room.They were not allowed to communicate via text chat or turn on their video during the task performance.To avoid the noises of other participants who were engaged in oral communication at the same time and ensure the quality of the recording, participants were asked to wear earpieces connected to microphones throughout the experiment.Figure 1 summarized the procedure of the study.

Coding
The L2 productions under both task conditions were recorded, transcribed and analyzed in terms of syntactic complexity, lexical complexity and accuracy.Measures that constitute valid descriptors of each dimension of L2 production were used.The AS-unit (Analysis of Speech unit) was employed as a fundamental unit of analysis for the present study following the contention of Foster et al. (2000) that it is the most appropriate unit for spoken language.Foster et al., (2000) defined it as: "A single speaker's utterance consisting of an independent clause, or sub-clausal unit, together with any subordinate clause(s) associated with either.An independent clause will be minimally a clause including a finite verb.An independent sub-clausal unit will consist of: either one or more phrases which can be elaborated to a full clause by means of recovery of ellipted elements from the context of the discourse or situation OR a minor utterance, which will be defined as one of the classes of "Irregular sentences" or "Nonsentences" identified.A subordinate clause will consist minimally of a finite or nonfinite Verb element plus at least one other clause element (Subject, Object, Complement or Adverbial)" (pp.365-366).
Regarding the measures of syntactic complexity, the mean length of AS-unit (calculated by dividing the number of words by the number of AS-units) (e.g., Inoue, 2016), serves as a measure of overall syntactic complexity (Norris & Ortega, 2009) while the number of clauses per AS-unit (calculated by dividing the number of clauses by the number of AS-units) (e.g., Fukuta & Yamashita, 2015;Santos, 2018) and the ratio of subordinate clause to the total number of clauses (calculated by dividing the number of subordinate clauses by the total number of clauses) (e.g., Michel, 2011) reflected participants' ability to use complex syntax.
Lexical complexity was measured by Guiraud's index of lexical diversity (e.g., Adams & Nik, 2014;Michel, 2011) and Lexical Frequency Profile (LFP) index of lexical sophistication (Laufer & Nation, 1995).In comparison to the commonly used type token ratio  1 The procedure of the study (TTR), Guiraud's index measures compensates for differences in text length by including the square root of the tokens (Vermeer, 2000).The Guiraud's index was calculated in the following formula: Types = √ Tokens .Lexical Frequency Profile (LFP) index reflects the quality of learners' vocabulary use (Laufer & Nation, 1995).It shows the proportion of words that learners use at different vocabulary frequency levels in their language production.The British National Corpus (BNC) word frequency list which contains 14,000 most frequently used word families in English was used to serve the purpose of the study (Nation, 2004).The transcripts of the participants' language productions were uploaded into and run by the Range program (Nation & Heatley, 2002) which matched words from the transcripts with the 14,000 BNC wordlists.In this study, the percentage of words that do not belong to the first 1,000 most frequent words was calculated.Following Nik (2010), the fifteenth and sixteenth wordlists were excluded because they contain proper nouns and non-words.It was calculated in the following formula: (Wordlist 2 until14/ Wordlist 1 until14) × 100.
Following the previous studies, accuracy was indexed by the number of errors per 100 words (calculated in the following formula: (the number of errors/ the total number of words) × 100) (e.g., Mehnert, 1998;Ruiz-Funes, 2015;Vasylets et al., 2017) and the target-like use of verbs (calculated in the following formula: the number of accurately used verbs/ the total number of verbs) (e.g., Kormos & Trebits, 2012).

Reliability of coding
Apart from the word count which was calculated automatically in the Range program, the participants' L2 productions were manually coded by the researcher.A second rater (another researcher with a PhD in Applied Linguistics) recoded a randomly selected sample of 40 percent (as in Rahimi & Zhang, 2017) of the transcripts to check for the interrater reliability.Cohen's kappa (1992) values indicated high intercoder agreement ranging from a high of 100 percent for Guiraud's index and a low of 87 percent for target-like use of verbs, with only one below 90 percent, indicating reliability of data coding well above the acceptable level.

Statistical analysis
Measures of the three dimensions of L2 productions (syntactic complexity, lexical complexity and accuracy) in the simple and the complex tasks were compared using paired samples t-tests or Wilcoxon signed ranks test depending on the normality of the data distributions.Effect sizes were also measured, with d values of 0.20, 0.50, and 0.80 for t-tests and r values of 0.10, 0.30, and 0.50 for Wilcoxon test considered small, medium, and large, respectively (Cohen, 1992).The statistical power analysis was performed using G*Power 3.1 (Faul et al., 2009) with α = 0.05 and power = 0.80.The sample size (n = 42) was adequate to find medium effect sizes for all variables.

Results
Table 2 demonstrated the descriptive statistics of the dimensions of the participants' L2 production under the simple and the complex task conditions.It seemed that the syntactic complexity showed an overall increase as the result of increased task complexity.For the mean length of AS unit, the mean score was M = 5.81, SD = 1.35 for the simple task and M = 6.01,SD = 1.46 for the complex task.With regard to the number of clauses per AS unit, the mean scores were M = 1.29,SD = 0.18 and M = 1.32,SD = 0.17 for the simple task and the complex task respectively.Regarding the subordination measure, the mean value for the simple task was M = 0.20, SD = 0.10 and it appeared to be slightly higher when the task was more cognitively demanding M = 0.22, SD = 0.10.The same trend was also observed for lexical complexity in that the mean scores for the complex task were higher in both Guiraud's index (simple task: M = 5.74 SD = 0.59; complex task: M = 5.78 SD = 0.68) and in LFP (simple task: M = 4.72 SD = 2.23; complex task: M = 8.96 SD = 2.29).When it came to accuracy, noticeable changes in few errors per 100 words were hardly found between the complex task M = 8.59, SD = 3.75 and the simple task M = 8.62, SD = 3.65.In the same vein, similar values were obtained for the simple task (M = 0.90 SD = 0.07) and the complex task (M = 0.90 SD = 0.07) in the target-like use of verbs.The inferential statistics displayed in Table 3 revealed that none of these changes in syntactic complexity were statistically significant with p = 0.24, d = 0.18 for the mean length of AS unit, p = 0.41, d = 0.11 for the number of clauses per AS unit and p = 0.10, r = 0.25 for the ratio of subordinate clauses to the total number of clauses.As far as lexical complexity was concerned, significant effects were detected only for LFP t (41) = 8.84, p < 0.001 with a large effect size d = 1.36, whereas the Guiraud's index was not significantly affected p = 0.66, d = 0.07, suggesting that increasing the number of elements elicited the use of more sophisticated words.Turning to accuracy, our initial visual observations were confirmed because no significant effects were found for either errors per 100 words p = 0.96, d = 0.01 or target-like use of verbs p = 0.90, d = 0.02.

Discussion
This study took the initiative to investigate the effects of task complexity manipulated along the resource-directing factor +-few elements on Chinese EFL learners' L2 production in terms of syntactic complexity, lexical complexity and accuracy.The results revealed that task complexity, as operationalized in the present study, hardly resulted in any significant differences in the L2 performance, with the exception of lexical complexity by the measure of LFP which showed an increase in the complex task, running counter to the predictions of the Cognition Hypothesis ( 2001) that enhanced task complexity by the resource-directing dimension leads to higher accuracy but lower complexity in interactive tasks.
The minor effects of task complexity in the audio-based SCMC could partially be attributed to the lower L2 proficiency of the participants.According to Kuiken et al. (2005), the effects of task complexity may be less pronounced for learners with lower proficiency levels as they may not have attained the threshold level of L2 proficiency needed to dedicate their attention to the increased task complexity.In other words, for lower L2 proficiency learners, the simple task may have already been difficult enough for them to accomplish.Whether they were capable of dealing with the increase in task complexity is in doubt.Kormos and Trebits (2011) confirmed this claim, suggesting that it was important that learners reach a certain proficiency threshold for the effects of task complexity to be detectable.As no study has been conducted to examine the task complexity variable +-few elements in SCMC modes, we established tentative comparisons of the current findings with those of previous research examining the effects of +-few elements on oral L2 productions.In what follows, the current results would be discussed in terms of syntactic complexity, lexical complexity and accuracy respectively.
Regarding syntactic complexity, the current findings aligned with those of Michel et al. (2012).Nonetheless, they were inconsistent with those of Révész (2011) which detected lower syntactic complexity in the more complex interactive task.The discrepant findings might be ascribed to the different tasks used in the present study and Révész (2011).While the cognitive complexity of the two tasks in our study were differentiated along the resource-directing factor +-few elements, the two versions of tasks used in Révész (2011) were intended to differ along both +-reasoning (a resource-directing factor in Cognition Hypothesis) and +-few elements.Therefore, it would be difficult to decide whether the negative task complexity effects on syntactic complexity in Révész (2011) Qian and Shamsudin Asian. J. Second. Foreign. Lang. Educ. (2023) 8:48 was the consequence of increase in reasoning or increase in elements or the combination of both.A possible explanation for the insignificantly affected syntactic complexity in the present study is that the SCMC mode (i.e., audio chat) via which the participants performed the tasks may have mediated the way the participants used their language.Specifically, different than FTF communication where interlocutors can resort to facial expressions or body languages to interpret one another's true feelings if they had a hard time understanding their interlocutors' verbal language, audio chat hardly provides such visual cues.As a result, learners were likely to be very focused on listening so as to comprehend the conveyance of their interlocutors.Meanwhile, they also had to organize and formulate their thoughts to keep the flow of the conversation.Consequently, in order to ensure a quick delivery of the intended message, they may have chosen to employ the simplest syntactic structures in their language use.
Concerning lexical complexity, positive effects of task complexity were observed on the measure LFP.This finding testified to the claim of the previous researchers that the limited attentional resources are first driven to those elements that carry the bulk of message meaning, primarily lexicon, and then to the communicatively redundant formal features of language (Lee et al., 1997;VanPatten, 1994).Excerpt 1 from the simple task transcription of dyad 6 and Excerpt 2 from the complex task transcription of the same dyad manifested the higher LFP in the complex task which was measured by calculating the percentage of words that do not belong to the first 1000 most frequent words of the British National Corpus (BNC) word frequency list which contains 14,000 most frequently used word families in English (Nation, 2004).The lexicon beyond the first 1000 words used in the two tasks were in bold in the following Excerpt 1 and Excerpt 2.
Excerpt 1: As shown in Excerpt 1 and Excerpt 2, this dyad used noticeably more sophisticated words in the complex task than in the simple task.There was a possibility that the higher LFP in the complex task resulted from the nature of the elements they were asked to talk about rather than the fact that they had to take into account a greater number of elements (Rahimi, 2019).For example, the input of the simple task concerned the highfrequency daily words (such as bedroom, water, electricity), whereas the complex task material involved more sophisticated vocabulary (such as anti-allergic, buffet, food court).Therefore, the differences in the lexical sophistication between the two tasks need not be necessarily attributed to the increases in the number of elements, rather they might result from the goal and requirements of the task (Pallotti, 2009).
The absence of significant effects of task complexity on accuracy in the present study was supported by Michel (2011), Michel et al. (2012) and Kim (2020).However, this finding ran counter to those of Révész (2011) which observed higher accuracy in the complex interactive oral task.As discussed earlier, Révész (2011) used different tasks than the present study, which might cause the different findings.What is more, the indices of accuracy in Révész (2011), including errors per AS unit, error free AS units per AS unit, and self-repairs per errors, were also different from those used in the present study.In the study of Révész (2011), errors per AS unit and error-free AS units were found significant.Nevertheless, these two measures were argued to be problematic in gauging accuracy.To be more specific, errors per AS unit neglected unit length (Skehan & Foster, 2008).Consequently, when compared to the number of errors in lengthier AS units, disproportionately smaller number of errors may seem to exist in short AS units.The method of error-free AS units was criticized for obscuring the true numbers of errors in L2 production because an AS unit with a single error was treated the same as an AS unit with multiple errors (Bardovi-Harlig & Bofman, 1989).
Again, the results concerning accuracy might be related to the features of audio chat.The participants were forced to concentrate on listening due to the absence of visual cues which could assist the interlocutors to understand each other.They were also expected to respond promptly in the real-time oral communication.As a result, the listening process for comprehension and the delivery of intended message under time pressure may have already been cognitively taxing and taken up so much of the limited attentional resources (Skehan, 1996) that little attention was left for them to monitor their output, leaving the errors in their output undetected and uncorrected.
To recap, it was possible that increasing task complexity along the resource-directing factor elements which was assumed to drive participants' attention to language forms (Robinson, 2001) may have failed to do so in audio SCMC.It seemed that to some extent, the absence of visual aids and lack of planning opportunities (a resource-dispersing factor in Cognition Hypothesis) in audio SCMC diverted attentional resources away from linguistic aspects to other information processing activities to maintain the spontaneous online communication, thus neutralizing the effects of the resource-directing factors (i.e., more elements in this study), hence the similar performance across the two tasks.This seemed to confirm the assertion of Robinson and Gilabert (2007) that the beneficial effects of increased task complexity along the resource-directing factors (e.g., elements) might be weakened or neutralized if the task is kept demanding along the resource-dispersing factors (e.g., planning time) simultaneously.

Conclusion
This study was the very first to examine effects of task complexity in an audio-based SCMC environment.Inspired theoretically by Robinson's Cognition Hypothesis, it yielded results that contradicted the Cognition Hypothesis.This study was meaningful in that it provided empirical evidence that the Cognition Hypothesis was not applicable to audio-based SCMC environments, suggesting that the audio chat might have mediated learners' cognitive processing and consequently affected their use of language during the task performance.Following from this, we would tentatively suggest that it might be inadvisable for teachers to design tasks by manipulating task complexity under the guidelines of the Cognition Hypothesis in hope of improving learners' L2 output when audio-based SCMC is used as the interaction medium.
It was worth noting that the results of the present study should be taken with caution due to the limitations.First and foremost, it was conducted in a context where English was learnt as a foreign language.The participants seldom communicated in English other than in English classrooms, which limited their opportunities to practice their spoken English.A further limitation of the present study concerns the gender and age of the participants, most of whom were female adults within an age range of 20 to 23 years.Furthermore, as mentioned in the earlier sections, the construct element has been operationalized in a variety of ways.This study followed Kuiken and Vedder (2007) which operationalized it as the number of requirements to consider.Different results may have been obtained had different ways of operationalization been used (e.g., the number of options to choose from as in Michel et al., 2007).Last but not least, this study only examined the variable of +-few elements in audio chat.It remained an empirical issue how this factor would affect L2 production in text chat and video chat.The future studies may seek to elucidate these issues.
Fig.1The procedure of the study

MLAS
= mean length of AS unit; CAS = clauses per AS unit; RSC = ratio of subordinate clauses to the total number of clauses; E/100W = errors per 100 words; TLV = target-

Table 1
Results for the self-rating questionnaire

Table 2
Descriptive statistics of L2 production in simple task and complex task (N = 42) MLAS = mean length of AS unit; CAS = clauses per AS unit; RSC = ratio of subordinate clauses to the total number of clauses; E/100W = errors per 100 words; TLV = target-like verbs

Table 3
Results of paired-samples t-tests and Wilcoxon Signed Ranks test for the measures of L2 production Transcription of simple task performance done by dyad 6 MAO: I look for the five hotels about our um… REN: What hotel do you want to live in?MAO: I want to live in the No.5 hotel, the last one.REN: But I want to live in the fourth hotel.MAO: Can you tell me why do you want to choose the fourth?REN: Oh, I think it's located for those who seek quiet holidays on the beach.We can also go hiking in the mountains.MAO: I think the fifth has a breakfast with garden.We can get breakfast in the hotel and it is convenient for us to live in it.REN: But I think the fourth have many shops around it, which can make our shopping more convenient and we can have fun.Do you think so? MAO: I agree with you, but I think the last one have the quiet location.We can enjoy our vacation best.REN: Um, your suggestion sounds wonderful, but I want to argue because it location at a considerable distance from the city center and it close to the duty-free shop.We are always buy some extravagant clothes and some things.I think it is very indispen-