The global digitalization of education is changing the process of knowledge transfer, the methodology of education and self-education, new teaching media, as well as Digital Data inclusion in scientific research projects that allow us to bring educational technologies to a new level. Development of Big Data systems, including search engine results and text corpora, opens up new possibilities and allows us to set and solve research problems in an efficient way. With fast sharing of digital technology it perfectly opens up the completely new opportunities for today's researchers in modern teaching activities. The article focuses on the capabilities, verification potential of search engines and the procedure for implementing search queries in Big Data in a semantic research. The author suggests a description of the experimental methodology of semantic research which involves the use of Google / Yandex search engines, which are an effective tool for the semantic structure analysis. Empirical evidence of the effectiveness of this technique is provided as well. The data of search engines Yandex and Google coincide when determining dominant values. At the same time, the data of the Google search system still allows us to determine the semantic boundaries of the studied phenomena more accurate and fuller, compared with the data provided by the texts corpora.
Keywords: Big-Dataeducational search technologiessearch engines
Education, ultimately, assists learning in the form of transferring knowledge and skills. Having reliable information and proper evaluation of Big Data potential, as an advanced digital technology, is one of the most essential tasks and needs of modern education. With fast sharing of digital technology in modern teaching activities, it perfectly opens up the completely new opportunities for today's students and researchers. The global digitalization of education is changing the process of knowledge transfer, the methodology of education and self-education, new teaching media, as well as
Artificial Intellect can contribute to the education industry variously:
It appears that in order to qualitatively categorize the linguistic units to be analysed and to carry out a quaint quantitative analysis, the researcher can take advantage of the possibilities and verification potential of search engines (
Purpose of the Study
The question of the study allows determining the purpose of the study:
to define big data query parameters;
to determine through the experimental way their validity / non-validity.
1) We specify search parameters of investigated lexical units or syntax constructs from text corpora BNC, COCA, etc., with restriction by type of discourse; Search engines Yandex/Google; web-applications Google Trends and by time slices (implementation of search with these parameters is provided by text corpora if these parameters are significant for research).
2) The validity of the search results is determined by the following factors: firstly, the lexemes entered for the search must strictly correspond to the lexico-semantic variants within the semantic structure under study. For example, the search engine specifies verb search parameters that accurately reflect their lexico-semantic variants:
Despite the fact that the quantitative data provided by the search engines are variational, system monitoring and access fixation have made it possible to establish their validity, provided that quantitative methods (mathematical statistics methods) are used to interpret the results and fix the access date to the Search engine.
Traditionally, conducting semantic research, a hypothetical-deductive method is used (O. N. Seliverstova, O. A. Suleymanova, L. V. Shcherba, A. Mustajoki, O. V. Belaychuk, etc.). The procedure for conducting semantic research is multi-stage (Fomina, 2007). First and the most, the study begins with the collection of material (examples of the use of the studied language units, collected from authentic sources) and the dictionary definitions analysis of semantically similar language units from lexicographic sources. Currently, the resources that make it possible to provide the researcher with sufficient language material are mainly text corpora.
First and foremost we use National Corpus of the Russian Language (NCRL) for the study of language units (Nesset, 2019, p. 157). For the study of language units of the English language, depending on the type of discourse and the purpose of the study we use British National Corpus (BNC), Corpus of Contemporary American English (COCA), Corpus of Historical American English (COHA), Time Magazine Corpus English (TIME), Wikipedia Corpus (WC), Corpus of American Soap Operas (CASO) and the like (Sha, 2010). Using the methods of corpus linguistics, it is possible to obtain linguostatistical data on the frequency of use of a language unit and, accordingly, to establish contexts, distinguish genre and stylistic features of units, and then put forward a hypothesis based on the analysis of their distribution. For example, researching into the semantic structure of the English verb
Secondly and specifically, the researcher classifies the language material, highlighting the integral and differential features of the studied units for the formation of mutually exclusive groups. Thus, a content analysis is ‘a research technique for making replicable and valid inferences from data to their contexts’ (Krippendorff, 2018, p. 403). In the manner now being indicated, it could be argued that content analysis is used as a method to demonstrate a real time communication patterns, where the inference which arise through objectively and systematically investigating the meanings embedded into communication are developing a shared set of interpretations which could be further replicated, due to their focus on objectivity, validity and explicit rules (Mihailescu, 2019). Thirdly, with respect to meaning verification, a representative sampling is generated from mutually exclusive groups; an experimental sample is formed, which is provided to the informant-native speaker to assess the use (adequacy) / disuse of statements for verification of the primary hypothesis. Finally, a secondary verification of the experimental sample by informants is carried out, varying semantically similar elements within an identical statement for the subsequent formation of an adequate interpretation of the studied language units.
At the same time, the experimental method as educational search technology assumes the presence of both an informant and the use of
Therefore, in order to reduce the objective ‘noise’ in the source data, the possibilities of implementing the
The search string is sequentially entered quoted lexemes of verbs, accurately reflecting the syntactic construction ‘sodrogat’sya, drozhat’ i vzdragivat’ ot otvrashcheniya, strakha, zlosti, nezhnosti, styda, radosti, udivleniya’(potential English correlates: shudder, tremble, shiver with disgust, fear, anger, tenderness, shame, joy, amazement) In other words, to clarify the semantics of verbs ‘sodrogat’sya, drozhat’ i vzdragivat’ (potential English correlates: shudder, tremble, shiver), we are to conduct quantitative monitoring of data on compatibility with the designations of basic emotions to clarify the distributive boundary, since preliminary corpus analysis showed the frequency compatibility of this group of verbs (see Table
With identification of the Big Data variation we are researched into the validity / invalidity of the revealed data. So, in search databases Yandex and Google, with a time interval of 15 months, 1 week and for several days running were set quoted syntactic structures
It should be noted that conducting statistical analysis of empirical data in the semantic experiment, grouping data frequency of listrequest syntax, that is, finding the relative frequency (f) should the observed frequency of a trait or units of absolute frequency (n) divided by the total number of observations of the studied phenomenon (Levitsky, 2007, p. 79). So, at the absolute frequency of the verb ‘
Statistical analysis of constructions
Statistical analysis of the startle constructions showed that the Yandex search engine also records only two dominant collocations expressing negative emotions of
Statistical analysis of trembling designs showed that the Yandex search engine captures three dominant collocations expressing negative emotions to
As you can see, the data from the Big Data search engines revealed that the verbs under study are in the general semantic field, but their scope is different or limited. So, the verb
Providing any research on the basis of
To sum up the Big Data searing tools are:
Almost unlimited amounts of data (empirical material) (Suleimanova & Petrova, 2018)
Data are generated at high speed
Consistency and validity of ‘quoted’ data
The results are processed using methods of mathematical statistics
Search engines are efficient and accurate tool in semantic research
Consequently, the development of Big Data systems, search engines creates unique opportunities for linguistic research as educational technology.
- Demchenko, V. V. (2019). K voprosu o «logicheskih krugah» v leksikograficheskoj praktike: komponentnyj analiz gruppy glagolov tipa “shudder” [To the question of ‘logical circles’ in lexicographic practice: component analysis of a group of verbs as shudder and the like]. Filologiya. Teoriya yazyka. Yazykovoye obrazovaniye, 2(34), 109-113.
- EMC Education Services (Ed.). (2015). Data Science & Big Data Analytics. John Wiley & Sons.
- Fomina, M. A. (2007). Konceptualizaciya «pustogo» v anglijskom yazyke (empty, free, blank, spare, unoccupied, vacant i void) [Conceptualization of ‘empty’ in English (empty, free, blank, spare, unoccupied, vacant and void)]. Vestnik MGLU. Series Lingvistika, 541, 272–281.
- Jones, M. N., & Dye, M. W. (2018). Research methods: Big data approaches to studying discourse processes. In M. F. Schober, D. N. Rapp, & M. A. Britt (Eds.), Routledge handbooks in linguistics. The Routledge handbook of discourse processes (p. 117–124). Routledge/Taylor & Francis Group.
- Levitsky, V. V. (2007). Kvantitativnye metody v lingvistiki [Quantitative methods in linguistics]. Nova Kniga Publ.
- Mihailescu, M. (2019). Content analysis: a digital method. https://www.researchgate.net/publication/333756046_Content_analysis
- Krippendorff, K. (2018). Content analysis: An introduction to its methodology. Sage publishing.
- Leung, B. T. H., Xie, J., Geng, L., & Pun, P. N. I. (2019). Search Engines: Transferring Information Literacy Practices. Shanghai Jiao Tong University Press.
- Namenwirth, J. Z., & Weber, R. P. (2016). Dynamics of culture. Routledge.
- Nesset, T. (2019). Big data in Russian linguistics? Zeitschrift für Slawistik, 64(2), 157-174. http://doi.org 10.1515/slaw-2019-0012
- Roberts, C. W. (Ed.). (1997). Text analysis for the social sciences: Methods for drawing statistical inferences from texts and transcripts. Lawrence Erlbaum Associates.
- Sha, G. (2010). Using Google as a super corpus to drive written language learning: A comparison with the British National Corpus. Computer Assisted Language Learning, 23(5), 377-393.
- Shcherba, L. V. (2004). O troyakom aspekte yazykovyh yavlenij i ob eksperimente. YAzykovaya sistema i rechevaya deyatel'nost' [About the triple aspect of linguistic phenomena and about an experiment in linguistics]. Editorial URSS Publ.
- Suleimanova, O. A., & Demchenko, V. V. (2018). Ispol'zovanie big data v eksperimental'nyh lingvokognitivnyh issledovaniyah: analiz semanticheskoj struktury glagola shudder [Using big data in experimental linguocognitive studies: analysis of the semantic structure of the verb shudder]. Kognitivnye issledovaniya yazyka, 33, 466-472.
- Suleimanova, O. A., & Lukoshus, O. G. (2015). Znachenie yazykovogo znaka kak lingvisticheskaya konstanta [The meaning of a linguistic sign as a linguistic constant]. Proceedings of the conference “The humanities: issues and development trends”, Innovatsionnyj tsentr razvitiya obrazovaniya i nauki (pp. 78-83).
- Suleimanova, O. A., & Petrova, I. M. (2018). Eksplanatornyj potencial teorii klassov dlya lingvisticheskogo issledovaniya: poryadok sledovaniya opredelenij [Explanatory potential of the theory of classes for linguistic research: Word order in attributive group] (pp. 52–64). Filologiya: Nauchnye issledovaniya.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
About this article
20 November 2020
Print ISBN (optional)
Sociolinguistics, discourse analysis, bilingualism, multilingualism
Cite this article as:
Nikitina, V. V. (2020). Verification Potential Of Educational Search Technologies (Big Data) In Semantic Research. In Е. Tareva, & T. N. Bokova (Eds.), Dialogue of Cultures - Culture of Dialogue: from Conflicting to Understanding, vol 95. European Proceedings of Social and Behavioural Sciences (pp. 665-672). European Publisher. https://doi.org/10.15405/epsbs.2020.11.03.70