Measuring the contribution of specialist vocabulary knowledge to academic achievement: disentangling effects of multiple types of word knowledge

This study investigates the idea that knowledge of specialist subject vocabulary can make a significant and measurable impact on academic performance, separate from and additional to the impact of general and academic vocabulary knowledge. It tests the suggestion of Hyland and Tse (TESOL Quarterly, 41:235–253, 2007) that specialist vocabulary should be given more attention in teaching. Three types of vocabulary knowledge, general, academic and a specialist business vocabulary factors, are tested against GPA and a business module scores among students of business at a college in Egypt. The results show that while general vocabulary size has the greatest explanation of variance in the academic success factors, the other two factors academic and a specialist business vocabulary make separate and additional further contributions. The contribution to the explanation of variance made by specialist vocabulary knowledge is double that of academic vocabulary knowledge.


Introduction and background
In the course of developing sufficient ability for successful study in a foreign language, the importance of growing a foreign language lexicon of an appropriate size is beyond dispute. It has proved less easy to define exactly which words need to be learned and the order they might best be acquired. In Milton & Benn, 1933, Milton and Benn noted a huge variety in the vocabulary content of textbooks, suggesting little common ground in the principles of vocabulary instruction, and making it difficult to plan higher levels of teaching when learners could have so little in common at the basic levels. Catalán and Fransisco (2008) reflect, nearly 100 years later, in a study of modern textbooks, that writers continue to make idiosyncratic choices as to the number of words presented, and the selection of words.
To help clarify this difficulty, Nation (2001) identifies four different types of vocabulary: other subject specialisms. These are words like acetyl and accumulator which are relevant in engineering but for which there is probably little use in, say, linguistics or management studies.
Academic vocabulary has become synonymous in recent years with Coxhead's (2000) Academic Word List (AWL). The AWL is a list of 570 head words which are not included in West's General Service Word List (GSL), which is used as a proxy for highly frequent and basic vocabulary, but are disproportionately frequent in Coxhead's academic corpus when compared with the British National Corpus (BNC). An idea of the potential importance of the AWL is given by its contribution to coverage. In Coxhead's academic corpus, these 570 words can add about 10% to coverage while in more general discourse the coverage of these words is a fraction of this amount (Konstantakis, 2007).
The AWL has been added to EFL teaching uncritically and Gardner and Davies (2014) suggest it plays a central role in school success. However, it has been the subject of some reconsideration in recent years. Dang and Webb (2014) point out that its contribution to coverage in spoken academic discourse is not nearly as great as it is in written discourse, for example. There is an issue too in that the AWL does not appear to be equally effective in its contribution to coverage across different subject specialisms, and Coxhead's (2000) own figures suggest it is more relevant in her commerce subcorpus than in her science sub-corpus. Eldridge (2008) questions the use of West's GSL as a list of basic vocabulary since the GSL is both rather old and problematic. Not least of these problems is its size, at just under 2000 items, which means, as Masrai & Milton (2018) point out, it can be difficult to distinguish the AWL from a range of highly frequent vocabulary in other general corpora. Masrai & Milton note that 65% of the AWL falls within BNC's most frequent 3000 words, and 84% fall within the first 3000 words of the BNC/COCA lists. They reflect that had Coxhead chosen a different list to represent basic vocabulary, the most frequent 3000 words as Schmitt and Schmitt (2014) suggest, or the Oxford 3000 list (2020), then the AWL Coxhead derived would have been very different. She may even have concluded, as did Cobb and Horst (2004) of French, that there is no specialist academic lexicon.
There have been a number of recent attempts to assess the importance of knowledge of the AWL in academic success by calculating its explanatory power of the variance in Grade Point Average (GPA). General vocabulary knowledge is considered to have a significant impact on academic success, and Roche and Harrington (2013) point out that vocabulary makes a 25% contribution to variance in GPA. Townsend et al. (2012) calculate that its contribution varies according to academic genre and suggest figures between 26% and 43%. Townsend et al. further calculate the impact of the AWL on academic achievement and conclude that knowledge of this list can add between 2% and 7% to the explanatory power of general vocabulary alone. Authors (in press), who use a vocabulary measurement technique designed to exclude frequency effects in calculating the effect of AWL knowledge, report that general vocabulary can explain 46.7% of the variance in the GPA of language and translation students in Saudi Arabia, and knowledge of the AWL can add a further 11.5% to the variance explained in GPA scores. Notwithstanding the issues reported above, therefore, they conclude that knowledge of the AWL really does make an important contribution to academic success additional to general vocabulary knowledge. Another study by Schuth, Köhne, and Weinert (2017) with 173 German fourth graders also revealed that academic vocabulary knowledge was able to significantly predict variance in the learners' academic performance.
It is not entirely clear why the AWL makes this extra contribution to variance when, at first sight, it appears to be largely a list of frequent words. Masrai & Milton (in press) speculate the benefit may come not so much from the importance of the words themselves but the phrases they occur in. Hyland and Tse (2007) suggest that words in the AWL may be polysemic and that in addition to the meanings these words have in common usage, some may also have more specialist meanings which are useful in individual subject disciplines. This explanation can help explain not just the additional coverage AWL words can have when compared with others words of equivalent frequency in general corpora, but can also help explain why coverage of the AWL can vary from one academic discipline to another. Hyland and Tse suggest that more attention should be paid to specialist word lists. However, there appears to be no research which attempts to estimate the impact of specialist vocabulary in academic success, which can correspond to the work which attempts to calculate the impact and importance of the AWL.
It is hard to identify a specialist vocabulary list which has the prominence of the AWL. It can be harder to understand the system behind word selection and the impact on coverage and academic study, where these lists do exist. Konstantakis (2007) examines the Oxford Business Wordlist (Parkinson, 2005) and points out that about half of the words which the list includes can be found in the most frequent 2000 words of the BNC. A further quarter of the words are to be found in the AWL. Only about a quarter of the words might be argued to be specialist business words. Konstantakis (2007) attempts to put together a business word list which is more principled in its construction and from his specialist business list excludes words which are in either the GSL or the AWL but which are frequent in his business corpus. The 498-word list he produces can contribute only 2.5% in addition to the coverage provided by the GSL and AWL and the three lists combined cannot reach the 98% levels of coverage thought to be needed for fluent comprehension. There may be issues with the methodology here and Ward (1999) avoids using the GSL which, he points out, includes many words, such as toe, east and stolen, which have little relevance to the Engineering specialism he is interested in. In examining his Engineering textbook corpus, Ward argues that the most frequent 2000 word families alone can give nearly 96% coverage.
It is not yet clear, therefore, just how important this specialist vocabulary can be in achieving academic success where the methods used to investigate this can vary and the coverage figures for these lists can vary so much. This study, therefore, attempt to investigate the contribution of knowledge of one specialist list, Konstantakis's (2007) Business Word List (BWL), to variance in the GPA of university business students, and to consider this contribution in relation to the contributions made by general vocabulary size and knowledge of the AWL.

The current study
This study investigates the idea that specialist subject vocabulary knowledge can have a significant and measurable impact on academic performance, separate from and additional to the impact of general and academic vocabulary knowledge. Two broad aims guided the study. The first broad aim is to construct a test of specialist vocabulary knowledge which can control for frequency and can therefore give meaningful results when run in conjunction with tests of general vocabulary size and knowledge of the academic word list. Konstantakis's Business Word List (2007) is used as the specialist vocabulary list. The second broad aim is to investigate whether knowledge of a specialist vocabulary can make a contribution to academic success additional to general vocabulary size and knowledge of the Academic vocabulary. To achieve these aims, the following research questions were addressed: 1. To what extent do specialist vocabulary, general vocabulary size and knowledge of the Academic vocabulary correlate with GPA, and specialist business module scores (BusGPA)? 2. Does factor analysis show whether these separate elements of vocabulary knowledge are, effectively, a single vocabulary knowledge factor or function as separate factors? 3. What are the separate and combined contributions of these types of vocabulary knowledge to variance in students' GPA scores, and in specialist business module scores?

Participants
Participants in the study were 94 students recruited from the college of business administration at a private university in Egypt. They were third-and fourth-year students enrolled in a 4-year bachelor's degree program. They were 41 males and 43 females, with an average age of 22 (M = 21.79; SD = 1.25). To be accepted to the program, students should be at A2+ level of Common European Framework of Reference (CEFR), as measured with the Cambridge English Placement Test. Their courses were taught and assessed through the medium of English. The participants were informed about the purpose of the study and consent to participate was obtained prior to tests administration.

Academic vocabulary test
The academic vocabulary size test (AVST) (Masrai & Milton, 2018) was used to measure academic vocabulary knowledge. This test was designed to assess receptive academic vocabulary knowledge of words contained in the AWL (Coxhead, 2000). It contains 114 academic words and 19 additional control items used to adjust for guesswork. The control items are real words contained in Webster's Third New International Dictionary (1961) but are highly infrequent, beyond the 25,000-word level in Thorndike and Lorge's (1944) word list. The control items were selected from this list for the following reason. Masrai & Milton (2013) suggest that knowledge of these items among the native speaking university population is negligible and L2 learners cannot usually be expected to know them. Masrai & Milton (2018) demonstrate the AVST is a valid measure of receptive academic vocabulary knowledge on the basis of Rasch analysis. The test features a high sampling rate at 1:5. One hundred fourteen items from the AWL's 570 are sampled and should reasonably represent knowledge of the AWL. The test can be completed in about 10 min.

General vocabulary test
The general vocabulary knowledge test (GVST) is taken from Masrai & Milton (in press) and is designed to control for frequency effects on the contribution of general and academic vocabulary knowledge to the learners' academic achievement. Masrai & Milton observe that any test of the AWL will contain items spread across the frequency bands in general corpora and is likely to function as a test of general vocabulary size, therefore. In order to separate effects for academic word knowledge from general vocabulary knowledge, this frequency influence must be controlled and GVST is designed to do this by using the frequency characteristics of the AWL in a principled selection of general vocabulary test items drawn from the BNC-COCA frequency lists. One hundred fourteen items were selected from the 1st 1000 to the 6th 1000 frequency bands of BNC-COCA, matching the distribution of the AVST items within these bands. The items in the GVST were also matched with the AVST in part of speech. Similar to the AVST, a further 19 control items were added to the test to control for guesswork. These control items were drawn from the same list of control items used in the AVST Masrai & Milton (in press) are able to demonstrate that AVST and GVST are collinear and that AVST functions as a test of general vocabulary, therefore. There are no AWL items in the GVST so this test cannot function as a test of academic vocabulary, however, the AVST is comprised of items in general vocabulary corpora and so this test must also function as a test of general vocabulary size. This test structure allows any additional effect on academic performance created by the presence of academic vocabulary to be determined using regression analysis. The GVST can also be completed in about 10 min.

Specialist business vocabulary test
For the purpose of this study, a test of specialist vocabulary (BVST) was designed following the same design principles as GVST and AVST, and using Konstantakis's (2007) Business Word List. The test is intended to provide an estimation of receptive knowledge of these words. Konstantakis's list includes 498 items, and was principally generated to exclude items from GSL and AWL. One hundred fourteen items are therefore selected from the business word list with a frequency distribution to match that of the academic vocabulary size test (AVST) (Masrai & Milton, 2018) and general vocabulary test (GVST) (Masrai & Milton, in press) across BNC/COCA frequency lists. A further 19 items were added and were used to control for any guesswork. The control items were drawn from the same list from which the control items in GVST and AVST were selected. Following this procedure, we assume, in principle, that the scores from the three operationalised tests ought not to be affected by the frequency of the items in the tests, and any potential contribution they offer to learners' academic attainment might be attributed the type of word knowledge. The scoring of the three measures follows the same technique, as described by Masrai & Milton (2018, in press). Each yes response to any of the target test items is given 1 point and each yes response to any of the control items deducts 6 points. The 19 control items counterbalance the scores produced by the 114 test items. The maximum possible score a test-taker can get is 114, that is knowing all the 114 items and rejecting the 19 control items. The test takes only about 10 min to complete.

Academic achievement measures
As in Masrai & Milton (2018) and Roche and Harrington's (2013) studies, GPA was collected from the participants' records and was used as a measure of their general academic achievement. GPA is a summary measure of a number of modules and not all of these modules are subject specialisms. Additionally, therefore, a specialist subject measure (BusGPA) has been derived from the scores obtained by the students on specialist business English modules.

Procedure
The three study measures, GVST, AVST and BVST, were administered to the participants during their usual class time, after securing permission from the college management and course instructors, and with the agreement of the students. The test administration took place near the end of the summer semester. As each test needs only about 10 min, on average, to complete, the three tests were undertaken in one session. The participants' GPAs and their scores on English for specialist modules were collected after the students had completed their final examinations.

Data analysis
The data were first processed and screened using Excel spreadsheets. The three vocabulary measures were first scored dichotomously, with scores recorded as one and zero. Next, the total responses to target test items were summed and entered in a separate column as a total raw score. The same procedure was followed for the responses to control items, except that the sum of responses to control items was multiplied by 6. An adjusted test score was then obtained by subtracting the score of control items from that of target test items. Following this stage, the data were submitted to SPSS (V25) to perform the analyses. Descriptive statistics was first conducted to get an overview of the participants' performance on the study measures. Reliability analysis was then run to examine the performance of the vocabulary measures.
To examine the association between the vocabulary measures and the students' performance on the specialized course and their GPAs, Pearson and Spearman rho correlation analysis was conducted. Further, to quantify the unique contribution of the three vocabulary tests on the learners' achievement in the dependent variables, multiple regression analysis was performed.

Summary scores and reliability calculations
Reliability analyses were run to examine the performance of the vocabulary measures. The reliability indices of the test were as follows: GVST, α = .96; AVST, α = .95; BVST, α = .95. A summary of scores on the test measures is provided in Table 1.
The results from Table 1 show that the participants have greater knowledge of general vocabulary than academic and specialist vocabulary. They also appeared to know more academic words than specialist words.

Inter-test correlations
Correlations between the tests and assessment measures are presented in Table 2. Pearson correlation coefficient is used with the exception of those involving GPA. GPA is not considered nominal but as ordinal, and Spearman's rho is calculated in pairs involving GPA.
The correlation coefficients suggest that general vocabulary knowledge is the strongest correlate with GPA and specialist subject scores. The results also show that knowledge of specialist subject vocabulary correlated better with GPA than knowledge of academic vocabulary, but both variables showed similar correlation with the specialist subject scores.

Factor analysis
Results of factor analysis, which involved the three vocabulary tests (GVST, AVST and BVST), are shown in the scree plot in Fig. 1 and the component matrix in Table 3. When factor analysis was performed, the scree plot reveals a single factor with an Eigenvalue greater than 1. The component matrix confirms that the three vocabulary tests measure a single factor.

Regression analyses
Calculation of the correlations shows that the three vocabulary variables (GVST, AVST and BVST) share significant but moderate to weak correlations with the two academic performance variables (GPA and BusGPA). GVST has the strongest correlation with both academic performance variables.
Regression analysis, Table 4, shows that GVST scores can explain more of the variation in GPA, 14.2%, than the other vocabulary tests (AVST and BVST). In this model BVST contributes an additional 1% and AVST and additional 0.5% to the explanation of variance. Regression analysis, Table 5, shows that GVST scores can explain more of the variation in BusGPA, 18.0%, than the other vocabulary tests (AVST and BVST). In this model BVST contributes an additional 0.9% and AVST and additional 0.4% to the explanation of variance.

Vocabulary test correlations and factor analysis
The three vocabulary measures are constructed to assess knowledge of three different vocabulary types. However, all three tests must necessarily select words across a range of frequency bands in general corpora and must also, therefore, function as tests of general vocabulary size. The expectation is that these tests will be reliable and will also correlate strongly. In a previous study, Masrai & Milton (in press), two of these tests, GVST and AVST, proved to preform reliably and to be collinear.
In this study the three tests proved to be reliable with alpha scores of 0.95 or above in each case. However, the correlations between the three vocabulary tests, while statistically significant and moderate to strong, are not as strong as noted in the Masrai & Milton study. Nonetheless, factor analysis identifies only one factor so all three tests must be testing the same factor which must be general vocabulary knowledge, because GVST contains no AWL or BWL words whereas the AVST and BVST tests do contain words form across general frequency bands.
The Masrai & Milton study also produced results where the slightly different mean scores produced by the GVST and AVST tests did not prove statistically significant. In  this study, however, t-tests reported in Table 6 indicate that the differences in the means are greater and these differences are statistically significant. Part of the issue here appears to lie in the GVST scores which are closer to the maximum than in the other tests, and this suggests that the participants in this study have a higher level of general English than those in the earlier study. An outcome of this is that there appears to be a ceiling effect in the GVST results, and this is particularly visible in Fig. 2 where GVST and BVST scores are compared. This ceiling effect means the relationship between the two tests is not entirely linear and this must have an effect on the correlations which GVST has with other factors such as GPA and BusGPA, and in the amount of variance the test can explain in these other factors. The strongest correlation is between AVST and BVST, r = .74, and this might tentatively be interpreted as support for the idea, suggested by Hyland and Tse (2007), that the AWL contains polysemous words which have specialist in addition to more generally used meanings. AVST may function, therefore, as a specialist vocabulary test in addition to a test of academic and general vocabulary knowledge. This idea might additionally be supported by the observation that the lowest correlation is between GVST and BVST (r = .53).

Vocabulary knowledge and academic success
The principal objective of this study is to test the idea that subject specialist vocabulary knowledge can have a significant and measureable impact on academic performance, separate from and additional to the impact of general and academic vocabulary knowledge. All three vocabulary measures produce statistically significant correlations with both academic performance although, as shown in Table 2, these correlations are modest at best. While the general vocabulary measure, GVST, produces the highest correlations with both GPA and BusGPA, and may be expected to have the greatest impact on academic performance, the other correlations are of similar size and so the AVST and BVST measures may make independent contributions for variance in the academic performance measures. The usefulness of Nation's (2001) vocabulary types seems to be borne out. The multiple regression analyses allow the inter-relationship of the three vocabulary measures, and the significance of specialist vocabulary knowledge, to be seen.
Results of the regression analysis involving GPA, see Table 4, indicates that GVST alone can explain 14.2% of variance in the students' GPA. This result fits with the findings of other studies such as Masrai & Milton (2017), Roche and Harrington (2013), Townsend et al. (2012), and Szabo et al. (2020) that general vocabulary size has a significant impact on academic performance though the explanatory power of the variable in this study is smaller than that in previous studies. It is possible that the ceiling effects, noted in the students' GVST results, are producing this effect. The extent of the GVST test, which measures knowledge of the most frequent 6000 words in the BNC/COCA corpus, is an artefact of its construction which allows frequency effects to be controlled in the multiple vocabulary measures used in this study. A test covering a greater number of the most frequent words in English might, with these students, produce higher correlations and greater explanatory power of this variable. The analysis further suggests that the other two vocabulary measures are able to make separate and additional contributions for the explanatory power of vocabulary in academic success. BVST is able to add 1% to r 2 , and AVST a further 0.5%. The results of the regression analysis involving BusGPA reveal a very similar outcome although it was expected that because of the business specific nature of the modules involved in this measure, the explanatory power of the specialist vocabulary variable might be greater. The results in Table 5 show that general vocabulary knowledge, as measured by GVST, has the greatest explanatory power and can explain 18% of variance in BusGPA. The other two vocabulary measures, again contribute separately and additionally to this. BVST adds 0.9% to the model, and AVST a further 0.4% to the model.
It can be argued, therefore, that the three types of vocabulary knowledge make independent and separate contributions to academic success in the specialist area. However, the general vocabulary measure, GVST, is the most important and in both analyses has the greatest power in explaining variance in academic success as measured by GPA and BusGPA. The data also allows the broad aim of the study to be met and it can be argued, too that specialist vocabulary knowledge, as measured by BVST, does make a separate and measurable contribution to academic achievement additional to general and academic word knowledge. Indeed, the contribution of specialist word knowledge appears greater, in both academic success measures, than that of knowledge of the AWL. While relatively small in size, against both success measures, it appears that specialist vocabulary knowledge can contribute an explanation of double the variance of AWL knowledge. Hyland and Tse's (2007) suggestion that more attention should be paid to specialist vocabulary knowledge, as opposed to generalised academic vocabulary, appears borne out. Not only is the explanatory power of general vocabulary knowledge less in this study than in others, it appears also that the explanatory power in academic success of AWL knowledge is much less. Masrai & Milton (in press) study noted a contribution of AWL to academic success in excess of 11% whereas in this study it is 0.5% or less. Ceiling effects are not as evident in the AVST data as they are in the GVST data and it may be therefore, as suggested by Hyland and Tse (2007), that the issue of polysemy in some AWL words is having an effect here. They suggest that the AWL may contain words that are, in effect, specialist vocabulary items and that the contribution to coverage and the explanatory power the AWL has in academic performance might be explained, at least in part, by their importance as specialist vocabulary rather than as features of a generalised academic vocabulary. The presence of the specialist vocabulary test, BVST, in this study, is picking up, but rather better, the effect of this factor. The smaller element of explanatory power left to the AVST may be, therefore, a rather better estimate of the relative importance of a generalised academic vocabulary knowledge to academic success.
This, in turn, should have an impact on the business of teaching vocabulary in the classroom and the idea that it plays a central role in academic success, suggested by Gardner and Davies (2014), can be questioned. The role of growing a large general vocabulary is shown in this, as in previous studies, to have a much greater effect on academic success than knowledge of academic vocabulary specifically. As Fig. 3 shows, a large vocabulary is not a guarantee of academic success, of course, and the results show that while a low general vocabulary always associates with low academic scores, a large general vocabulary can associate with both high and low academic scores.
In this study, however, the importance of academic vocabulary knowledge to teaching is further diminished. The results of this study suggest that knowledge of specialist vocabulary may also have a greater impact on academic success than academic vocabulary knowledge, and might be given greater priority in teaching. However, this paper has taken, for the study of specialist vocabulary, the area of business and it has already been noted that this is an area where the AWL has proved to have had a greater impact on coverage in academic English, than any of the other four abroad discipline areas Coxhead (2000) investigated. It is still a matter of speculation whether this relationship between the three types of vocabulary can be replicated in other disciplines.

Conclusion
This study is an initial investigation into measuring the contribution of specialist vocabulary knowledge to L2 learners' academic achievement, controlling for the effect of general and academic word knowledge. The principal conclusion to be drawn from this study is that specialist vocabulary knowledge does have a measurable impact on academic success separate from and additional to the impact of either general vocabulary or academic vocabulary knowledge. Its impact on academic success is still less than that of general vocabulary size which, as in other studies, explains most variance of the vocabulary factors. Specialist vocabulary, however, in this model, seems to have greater impact on academic success than academic vocabulary knowledge. This maybe because a test of academic vocabulary knowledge is also a test of specialist vocabulary knowledge and this effect may now be factored out in this study by the presence of the specialist vocabulary test. Or it is possible that the impact of academic vocabulary knowledge is genuinely less important than the other types of vocabulary knowledge which might lead to the suggestion that knowledge of the AWL is overrated in teaching and more attention should be paid to specialist vocabularies.
These are the results of one study, and it is to be seen in future research if the results of this study are replicable. It is particularly to be wondered if they can be repeated with the vocabularies of other specialist academic areas and in areas, for example, where the impact of the AWL on coverage is less. It is to be noted too that the explanation of variance in academic success by vocabulary is less in this study than in previous studies, and this has been attributed to the ceiling effect present in the vocabulary tests. It wold be useful to try this type of study on a different population, or with amended tests, where the ceiling effects are not present.