Lexical Bundles in Academic Writing: The Issue of Specificity


The contextual knowledge of a word is closely related to the knowledge of phraseological sequences as words are often used in the phraseological forms. Owing to the importance of phraseological knowledge, much has been done to examine the phraseological sequences for various purposes, including for English for Academic Purposes (EAP). In EAP settings, scholars have argued for the two different approaches to EAP, i.e. discipline-specific and common-core. As such, it is necessary to examine the issue of specificity in EAP with regard to the use of phraseological sequences such as lexical bundles. This study therefore aims to identify lexical bundles in journal articles in the field of International Business Management (IBM). Following corpus-driven approach the corpus analysis software, Collocate 1.0 was used to extract three- to five-word combinations. These combinations were manually checked to exclude meaningless combinations. To determine to degree of specificity of the lexical bundles, the final lists of lexical bundles were compiled and compared with lexical bundles in Academic Formulas Lists (AFL) using log-likelihood test. The comparison reveals that lexical bundles in the IBM corpus are relatively specific as compared with the lexical bundles in AFL which are derived using common-core approach. A discipline-specific approach to the teaching and learning of lexical bundles in EAP settings is therefore advocated to enhance EAP syllabuses and instruction.

Keywords: English for Academic Purposescorpus-drivenphraseological sequenceslexical bundlesspecificity


Academic study and writing require unique demands on language users as constructions and patterns of academic work are very different from the conventions that language users are more familiar with, for example the conversational register. Thus, the process of adapting to a hitherto unfamiliar register may pose difficulties for language learners at tertiary institutions. Research have found that university students have problems with using accurate and effective expository language in the academic register. In the case of non-native students in particular, these problems are compounded by the additional complexities involved in mastering the language itself. In most cases, university students have to write and publish academic research articles despite not having received the necessary training for the task. As a result of this shortcoming, scholars as well as practitioners in the field of English for Academic Purposes (EAP) have started examining the linguistic and textual features of academic writing in various disciplines from the linguistic and pedagogical perspectives.

Problem Statement

With the flourishing of corpus-driven phraseological research since the last decade, attention has been shifted to examining and building lists of academic phraseological sequences for EAP curriculum. For instance, in a corpus-driven study of academic discourse, Simpson-Vlach and Ellis ( 2010) employed a combination of statistically-driven approach and teacher insights to identify and extract a list of the most useful lexical bundles, which they termed Academic Formulas List (henceforth AFL). Simpson-Vlach and Ellis ( 2010) identified academic lexical bundles common to many academic disciplines that have high frequency and are of general and academic use. They therefore concluded that a general approach to EAP is sufficient to derive lists of common core academic phrases that transcend disciplinary boundaries. Their conclusion followed ideas pioneered by Zamel ( 1993) who strongly advocated a common-core approach to EAP courses whereby EAP instructors should focus on language forms common to all disciplines. Nevertheless, it was argued that each academic discipline has its own subject-specific conventions ( Green & Lambert, 2018). Hyland (2002, 2006), a strong proponent of discipline-specific approach to lexical bundles refuted the idea of core academic clusters by demonstrating variations in the frequencies and functional uses of academic lexical bundles in different academic domains. According to Hyland, there are significant amount of formalities in academic texts, which are characterised by the use of subject-specific vocabulary. The issue of specificity has thus challenged instructors and linguists in the field of EAP to take a stance on how language should be perceived, that is whether language forms and features are transferable across different academic disciplines or specific to particular fields or disciplines. There are differing views with regard to the approaches to phraseology for EAP and this issue is still debatable in the field. It is therefore necessary for researchers to continue exploring phraseological sequences such as lexical bundles in academic discourse for the sake of further enhancing EAP instruction and curricula.

Definition and Previous Studies on Lexical Bundles

Lexical bundle was first defined and studied in detail by Biber, Johansson, Leech, Conrad, and Finegan in a chapter of the Longman Grammar of Spoken and Written English (henceforth LGSWE) ( 1999), their exhaustive and comprehensive corpus study of grammar in English language. This seminal work deserves attention here as most studies on lexical bundles are largely based on the definition and framework proposed by Biber et al. ( 1999). According to Biber et al. (1999: 989-990), lexical bundles are “bundles of words that show a statistical tendency to co-occur… as recurrent expressions, regardless of their idiomaticity, and regardless of their structural status”. Lexical bundles are seen as sequences of word forms that are found frequently in both written and spoken discourses. They are usually identified empirically and extracted automatically from a corpus using the relevant corpus analysis software. In relation to lexical bundle research in academic genres, numerous studies have been conducted on lexical bundles to examine the use of lexical bundles by native and non-native speakers, and expert and novice writers ( e.g., Chen & Baker, 2016; Pan et al., 2016; Bychkovska & Lee, 2017; Esfandiari & Barbary, 2017; Kwary et al., 2017; Hyland & Jiang, 2018; Shin et al., 2018; Lu & Deng, 2019; Shin, 2019; Jeong & Jiang, 2019; Wright, 2019). Nevertheless, little is known with regard to the approaches to lexical bundles in academic settings as the issue of specificity remains largely unexamined.

Research Questions

Specifically, this study addresses the following question:

How do lexical bundles in journal articles in the field of International Business Management (IBM) differ from those in AFL?

Purpose of the Study

This study compares lists of lexical bundles representing IBM and AFL ( Simpson-Vlach & Ellis 2010) to determine the specificity of the lexical bundles in this study. Following common-core approach, AFL is a list of lexical bundles retrieved from a corpus of academic writing sampled across four academic disciplines: Humanities and Arts, Social Sciences, Natural Sciences/Medicine and Technology and Engineering while the lexical bundles identified in this study represent lexical bundles extracted from a specialised corpus which contains only journal articles in the field of IBM.

Research Methods

The present study employed corpus-driven approach to identify and extract three- to five- word lexical bundles in a one-million word corpus with 138 original research articles taken from two international peer-reviewed journals relevant to IBM.

Identification of Lexical Bundles

The corpus analysis software, Collocate 1.0 ( Barlow, 2004) was employed to extract lexical bundles automatically by setting the span options, i.e. three to six words. Collocate 1.0 extracts lists of word combinations using two statistics: frequency and Mutual Information (MI). Following the literature, three- to five-word combinations that occur at least 20 times per million words in the corpus and achieve MI value of at least 3.0 were extracted. These combinations were then manually inspected to exclude meaningless combinations that were extracted automatically. To determine to degree of specificity of the lexical bundles, the lists of lexical bundles were compiled and compared with lexical bundles in AFL ( Simpson-Vlach & Ellis, 2010) using log-likelihood test.


A total of 1055 lexical bundles of varying lengths remained on the list after the manual filter. These 1055 lexical bundles make up 2.19% of the more than one million words in the current corpus. Table 01 compares the top 50 lexical bundles in IBM corpus with the top 50 core academic lexical bundles proposed by Simpson-Vlach and Ellis ( 2010). The comparison of the results of the study with those of Simpson-Vlach and Ellis ( 2010) was to determine the specificity of the lexical bundles in this study. To reiterate, Simpson-Vlach and Ellis’s list of academic formulas is a cross-disciplinary list of lexical bundles which uses a common-core approach to compile lexical bundles common in various academic disciplines. In contrast, the list of lexical bundles identified in IBM corpus is a discipline-specific list of lexical bundles, representing phraseological sequences which are seen specific in IBM. As shown in Table 01 , there are different types of frequent academic lexical bundles found in IBM and AFL, respectively.

Table 1 -
See Full Size >

Table 02 presents the list of lexical bundles common in IBM corpus and AFL. Of all the frequent lexical bundles in IBM corpus, 36% of them are seen common in the AFL. Besides, the log-likelihood test performed shows that more than 70% of the shared lexical bundles are more specific to IBM corpus. The results of the comparison indicate that the lexical bundles in IBM corpus are relatively specific as compared with lexical bundles in AFL. A discipline-specific approach to the teaching and learning of lexical bundles for EAP is seen necessary as in this study, more than 60% of the lexical bundles were not found in AFL.

Table 2 -
See Full Size >


The findings of the study indicate that academic lexical bundles are discipline-specific. The findings of this study have implications on how EAP should be informed in language classrooms at tertiary institutions. The outcome of the analysis suggests that EAP instructors should follow a discipline-specific approach, particularly in the teaching of phraseological sequences such as lexical bundles. To sum up, there are two different views on how instructors and researchers approach EAP and this issue remains debatable in the field. It is necessary for scholars in the field to continue examining the various forms of phraseological sequences in academic discourse for enhancing EAP instructions and syllabuses in order to benefit language learners at tertiary institutions.


This work was supported by Universiti Sains Malaysia Short Term Grant (304/PHUMANITI/6315044).


Business, innovation, sustainability, environment, green business, environmental issues, urban planning, municipal planning, disasters, social impact of disasters

Hong, A. L. (2020). Lexical Bundles in Academic Writing: The Issue of Specificity. In N. Samat, J. Sulong, M. Pourya Asl, P. Keikhosrokiani, Y. Azam, & S. T. K. Leng (Eds.), Innovation and Transformation in Humanities for a Sustainable Tomorrow, vol 89. European Proceedings of Social and Behavioural Sciences (pp. 695-701). European Publisher.