The shift towards the visuals in mass media has brought us new understanding of the role of the image in multimodal texts. Images play a most significant role in meaning making. The interpretation of the multimodal text is fraught with a number of problems arising from the interaction of the visual and the verbal parts of the text. The present study is aimed at exploring possibilities and challenges of interpretation of multimodal texts. The sample materials of the study are the readers’ comments about the photo essay on the Esquire magazine website and thematic Internet forums. Using the category of informativeness and closely related categories of evaluation and modality, the authors suggested a system of the multimodal text analysis based on the 3 types of information: factual, conceptual and subtext. The study has produced the following findings. 1) Readers mostly demonstrate superficial reading comprehension of multimodal texts limited by associations and emotional perception while losing some of the conceptual and subtext information with little critical thinking. 2) The comments left by the readers were quite similar, which shows a one-view angle and stereotyped readers’ perception of the text. 3) All-level interpretations of the multimodal text require a skilful visually-literate reader. 4) More objective findings require analysis of comments of “more skilful” readers such as photo critics, journalists and art critics who can comprehend the conceptual and subtext information in depth;

Keywords: multimodal text, photo essay, factual information, conceptual information, subtext information


The more recent rise of digital communication supported by various Internet and social media platforms involves people in construing meanings from the visual representations of the reality, which poses significant challenges to the ‘semiotics’ and ‘grammars’ of visual communication (Jones, 2019).

The visual turn reflected in the humanities has led to the new functioning of multimodal texts: the image is no longer an additional element to the verbal text, but it has become the basic mode of existence of the modern culture (Drozdova, 2014; Jewitt, 2005). Visualization of information is created on two levels: formal and content-related (Simakova, 2015). So visualization can be the basis for meaning making.

The problems of interpretation of multimodal texts have been in the focus of Russian and foreign researchers (Kolody, 2011; Bou-Franch& Garcés-Conejos Blitvich, 2018; Jones, 2019). These problems are connected with the interaction of different semiotic codes, which suggests their own interpretations and limitations (O’Halloran & Smith, 2011). There are different opposing opinions in understanding the interaction of the visual and verbal parts of the text (Michurin, 2014). Some researchers believe that the image itself creates a significant number of interpretational options, which creates the situation of uncertainty whereas the verbal text, interacting with the image, removes this uncertainty and specifies the communicative intention of the author. On the contrary, other researchers believe that the image should remove the ambiguity of interpretation of the verbal text and organize the visual perception of the multimodal text (Michurin, 2014). M.B. Voroshilova (2013) supports the latter opinion and claims that the added image defines the meaning of the text and, thus, it imposes limitations on its perception and narrows the possibilities of its interpretation. On the other hand, analyzing advertising texts, researchers have come to the conclusion that non-verbal components increase the possibilities of interpretation (Voroshilova, 2014).

The inductive text study has been done in relation to the specific basic categories such as cohesion, intertextuality, cognitive structure, conceptual metaphor. However, the detailed analysis of the text informativeness as a category has not been done yet though being the main category of the media text. Based on the categories of informativeness, evaluation and modality, we are going to conceptualise the system of the multimodal text analysis, and then investigate how deep the readers understand all levels of meanings of a multimodal text. It is not only the semantics of the multimodal text that is important for us but also how it is perceived by readers.

Our research questions are: 1) What are the main characteristics of multimodal texts in terms of their informativeness, evaluation and modality? 2) What are the characteristic features of photo essays? 3) How is the informativeness of multimodal texts perceived and evaluated by the reader? 4) What are the readers’ limitations and problems in interpretation of the multimodal text? What are the causes?

The present study is aimed at exploring the concept of informativeness of the multimodal text and identifying readers’ possibilities and challenges of the interpretation of the photo essay.

The research methods are as follows: the inductive approach to the multimodal text analyses, the methods of descriptive analysis, content analysis and contextual analysis. The sample materials of the study are the readers’ comments on the Esquire magazine website and thematic Internet forums.


In the framework of text linguistics, informativeness is considered to be the main category of the text (Galperin, 2007). The factual information is expressed with the words and phrases in their direct dictionary meanings.

The conceptual information represents the rethinking of the phenomena and the facts that are reported on the superficial text level. This type of information can be explicit and implicit. It enables different interpretations of what is happening and it represents the implementation of an emotional-subjective intention of the author. The concept is associated with deep meaning, the main idea, motives and intentions of the author who created the text (Zhinkin, 1997).

The subtext information is hidden. It complements the directly perceived information. There are two types of the subtext information: situational (it arises in connection with the facts previously described) and associative (it arises from the connection of the text with personal or public experience; it is more blurred). Understanding of the subtext develops in the process of comprehending semantic and stylistic implications of lexical, syntactic and compositional units of the text.

Analyzing the category of informativeness in relation to the multimodal text, we can state that it can be characterized both by its informative value (richness of facts) or the communicative value (communicative significance). While the informative value is the objective characteristic of the text, the informativeness of the text can be evaluated by recipients in different ways. This characteristic refers to the reader’s interpretation of the text, and it is determined by individual perceptual abilities of the reader (Schelkunova, 2004).

When analyzing factual information of the multumodal text, it is important to distinguish facts from opinions. As a rule, the description of events and opinions (comments) in the text are most often merged.

Another essential feature of multimodal texts is evaluation. Through opinion judgements, the addresser convinces the addressee of the rightness of their ideas and conveys political, ideological and moral principles of the society (Romantsova, 2012). Most researchers associate evaluation with the term “modality”. It is a text category, reflecting the emotional-volitional attitude of the author of the text when communicating with the reader (Matveeva 2010).

Applying the inductive approach to the text analysis and using the category of informativeness and closely related categories of evaluation and modality, we will use the system of the multimodal text analysis based on the 3 types of information: factual, conceptual and subtext, which is suitable for our sample materials – a photo essay in the Esquire magazine.

The Esquire is characterized by its visual concept. Photo essay is the most frequently used genre in the Esquire. It is characterized by interpretation, including analysis, generalization and the expression of the author’s point of view in comparison with a description of the reality inherent in informative photo genres. Another characteristic feature of photo essays is a broad coverage of the reality and the large-scale generalizations and conclusions. The objects of the photo essay are phenomena of modern public life. The photo essay is a form of narration of the human’s life; the sequence of images is dependent on the real story line (Voron, 2012).

The research material is a photo essay titled “Air Alert” by a famous Italian photographer, Paolo Patrici (https://esquire.ru/photo/european_starlings?am_nestCmnt_reply_to = 1737 & comm_page = 1).

The photo essay has its own specificity in terms of the relationship between the various components of the multimodal text. There are complementary relationships between the text and the images. The text message and the images provide different information that helps to clarify and supplement the information of the other code (Jones & Hafner, 2012).

The text under consideration is characterized by complete creolization and interdependent relationship between the verbal part and the image. Without the comments, the meaning of the image is not clear and can be misinterpreted. The verbal comment performs the main function. In the photo essay “Air Alert”, the verbal text contains one sentence: “An Italian Paolo Patrizi takes pictures of starlings flying over Europe”. The non-verbal part comprises 19 black and white photographs, 18 of which depict bizarre shapes formed in the sky by countless birds. One photo depicts a car with excrements of birds.

Analyzing the factual information of the verbal part of the essay perceived by readers of the magazine and expressed in the comments posted on the website (https://vk.com/album-26953_164848696?act=comments), we can conclude that the readers fully understood the subject matter of the text but hardly commented it: “Chaotic drawings of starling flocks, ...”; “... photographs drawings that have been created randomly in the sky by flying starling flocks...”; “the owner of the car”.

To analyse the conceptual information of the verbal part and reconstruct the concepts that were interpreted by the readers of the Esquire magazine, we use the methods of the modern conceptual analysis proposed by E.S. Kubryakova (1994). Firstly, it is necessary to analyze the strong position of the text (the heading, subheadings, the first phrase, the last phrase, etc.). Secondly, there is a need to identify the expressive-emotional figurative means which will help understand what exactly is emotionally emphasized and evaluated. Thirdly, it is important to identify the attributes of the concept, its associations, including keywords that reveal various connections in the text (synonymous, antonymic, paronymic, derivational, hypo-hyperonymic, associative connections, etc.) (Kubryakova, 1994).

The title of the photo essay is multivalent and it enters into complex semantic relationships with verbal and visual parts of the text. The verbal part is a neutral text consisting of one sentence: “The Italian Paolo Patrizi photographs starlings flying over Europe”. However, being a set expression and a war metaphor frequently used in the mass media, the title combined with the verbal text forms the conceptual meaning of “a military threat looming over Europe”.

This idea is specifically developed by the visual part of the essay. Most of the photos capture the tops of urban roofs, trees and a large flock of starlings, forming bizarre shapes in the sky. The unusual behavior of the birds found an emotional response in the reader's comments: “Starling sur (surrealism)”; “Chaotic drawings of starling flocks... Surreal Starlings!”; “… Now I'm afraid of birds ...”.

In the essay, one photo stands out of a single composition and plays the role of a kind of counterpoint - this is a photo of a car that is completely covered with bird excrements. It is this single photo that shows a second meaning of the conceptual information, which is a threat or danger presented in an ironic manner. In their comments, the readers emotionally perceived this veiled irony and left ironic comments: “Here are the culprits”; “It's about money”; “It’s good that cows do not fly”; “Hmm, it is unlikely that the owner of the car will make bird houses next spring”; “You have to park the car in the garage”; “Poor thing ... I really sympathize with the owner of the car”; “Tough!”

Thus, the analysis of the conceptual information of the verbal and visual parts of the photo essay and the readers’ comments let us conclude that the readers do not reveal the deep meaning inherent in the publication. It seems that the main author’s intention is not comprehended by the average reader-commentator. When perceiving both verbal and visual parts of the text, the readers understand the conceptual information that is “on the surface”. They see the visible objects and their characteristics as they are depicted in the photographs. The readers are able to emotionally interpret only the general impression that creates the photo. As the analysis of the readers’ comments showed, the informativeness (communicative significance) of such journalistic materials is reduced and revealed from one-view angle, reflecting only a small part of all interpretation possibilities.

Speaking about the problems of multimodal text interpretation and its limits, they are caused by the difference in linguistic, intellectual, emotional and aesthetic experiences of the author and the reader (Schirova & Goncharova, 2007; Kravchenko, 2012; Abrosimova & Kravchenko, 2017). Also, they may depend on the specifics of the national language, culture and aesthetic ideas of the society in a particular historical era (Arnold, 1974).

These interpretation peculiarities may lead to another effect when multimodal texts can produce readers‘ conceptual meanings that were not implied by the author. The author’s interpretation of the multimodal text may differ from the interpretations of the same text by other people, including the “mainstream” interpretation by most recipients (Vashunina, Ryabova & Egorova, 2018).

The photo essay under study suggests subtext: the photographer uses the “strong” symbols (ambiguous or complex signs) to advance specific ideas that create subtext. In this case, the signs are arranged so that they do not conflict with each other, but deepen the meaning of the photo and cause stronger emotions.

We assumed that in the comments on the photos, there would be many options for interpretation of the subtext information. However, the analysis showed that the readers' judgments are repeatable. Almost all comments with reference to the subtext information contained precedent phenomena related to the visual art. An intertextual sign carries a subtextual potential (Kuzmina & Abrosimova, 2015).

There are also some associations with American films. The precedent phenomena of the photo essay are related to “Birds” by Hitchcock and the Dementors from the film about Harry Potter: “The photos have a stronger effect on the psyche than the Hitchcock film ...”; “Now I'm afraid of birds”; “Now I understand where the image of Dementors came from in the film “Harry Potter”. The readers also associate “Air Alert” with the works of Mario Giacomelli, an Italian photographer who worked with black and white photography and made amazing striking visual stories: “By the way, they reminded me of Mario Giacomelli”. Another association was with the surrealistic art: “A critic-fighter with starling surrealism!”; "Surreal starlings!”

The comments show that the readers understand the subtext information at the junction of the verbal and visual parts of the multimodal text by identifying verbal and non-verbal cues of the subtext. The imagery is created thanks to the double meaning arising from direct and figurative meanings of the text expressed verbally and visually. In this case, the verbal part is correlated with the factual information (Europe, Italy, birds), and the visual part is connected with the conceptual information (threatening number of birds, chaotic movements, a specific technique of photography). Thus, the I.R. Galperin’s thesis that the subtext information is created at the junction of conceptual and subtext information is proved.

In addition, we have identified the cases when different readers made the same subtext meanings. The repetition in the text speaks about its subtext (Silman, 1969). During the first reading, the reader may fail to understand the subtext information whereas during the repeated reading, the reader may perceive the subtext information through meaningful and factual information of the text. This is a distinguished feature of the subtext information (Galperin, 2007). Through these repetitions, commentators speak about the subtext meanings they have found.


The wide use of photo essays in the modern mass media reflects the general cultural tendencies towards visualization and suggests a skilful visually-literate reader who is able to think visually. The researchers note that this literacy requires special skills from the reader. The study reveals that in most cases, the readers demonstrate superficial reading comprehension of the visual text limited by associations and emotional perception while losing some of the conceptual and subtext information with little critical thinking.

The photo essay of the famous photographer allows a wide range of interpretations. Only a small number of them were revealed in the readers’ comments. The comments are turned out to be quite similar, which can probably be explained by the readers’ communicative intentions. Obviously, the most readers’ interest was in communication with each other but not in the in-depth comprehension of the photo essay.

It is worth noting that the comments on the journal’s website have a bigger variety of interpretations. There are really original judgments compared with those on the thematic forums. Apparently, for more objective findings, it is necessary to analyze comments of “more skilful” readers such as photo critics, journalists, art critics who can comprehend the conceptual and subtext information in full.

Of course, it is not the reader’s fault that the photo essay has not been interpreted in depth. The researchers (R. Bart, V. Kolody, etc.) explain the objective difficulties of interpretation: meanings are linked to the time of creation, type of genre and the place of publication of the photographs. That is why the photos can be “read” in completely different ways. Being an object of cultural and philosophical analysis, the photographs show their meaning with a different degree of intensity. This meaning may be obvious or completely vague. The essential meaning of photographs can only be described through dual oppositions: continuity - gaps, stillness - movement, stadium - punctum, desire to avoid death - the embodied death (Kolody, 2011).


