Exploring User-Generated Audiovisual Translation On Youtube: Constraints And Affordances


This paper discusses the phenomenon of audiovisual user-generated translation (UGT) in light of the constraints and affordances engendered by online social media (OSM) ecosystems. It presents a short overview of audiovisual translation (AVT) as a type of constrained translation, and reflects on the phenomenon of YouTube UGT. The idea of constraints as developed in translation studies is discussed in relation to the notion of constraints and affordances of the digital socio-technological spaces. Online, the inherent multimodality of audiovisual text is superposed on the multimodality of the OSM architecture bearing a variety of affordances, which leads to the increased complexity of AVT. Drawing on the example of a popular UGT-focused YouTube channel featuring Russian-language voiceovers, the reported study illustrates several instances revealing the interplay of AVT-inherent constraints and YouTube-specific affordances. The discussion utilizes a theoretical perspective informed by the framework of translation sociology, which offers a unique outlook on the nature online social media UGT mediated by YouTube. Among the creative strategies which reveal the mediatized nature of YouTube-specific audiovisual translation are the following: preserving conversational nature of the source videos by adhering to voiceover, not subtitling mode of AVT; tailoring the linguistic profile of translation (adding jargon and slang) and structuring the content stream based on feedback analysis (YouTube comment section), as well as statistics, analytics and demographics of the viewer crowd; using meta-texts (extensive notes and blog entries) to build intertextual network of references.

Keywords: User-generated translationonline social mediaaudiovisual translationconstrained translationmediatization


Owing to the plethora and ubiquity of digital media technologies, a lot of common everyday practices progressively become more entangled with the digital functionality. Networked environments, such as online social media (OSM) engender long-term cognitive, communicative, social and cultural shifts theorized as the processes of mediatization (Hepp et al., 2015; Zagidullina, 2017).

Much like other social practices, interlingual translation gradually becomes more digitized and mediatized. The newly reframed image of a prototypical professional translator has begun to gravitate towards the non-literary, or business translator (Dam & Koskinen, 2016) working in an increasingly competitive setting, where the workload is split between human and non-human agency. Another trend brought about by the omnipotence of technology is the democratization of translation. Currently, any bilingual or multilingual user can take on a role of a user-translator (Gambier, 2016), i.e., use machine translation for a variety of purposes, join a fan community to perform “fan translation, fan subbing, fan dubbing, and scantrans on deliberately chosen mangas, animated films, and video games” (Ibid.), or take part in various translation-related projects online alongside the professionals who, in turn, may “respond to a particular call which they consider worthwhile, despite a lack of remuneration” (O’Hagan, 2011).

Translations done by the “general Internet user” first gained the status of a legitimate object of research in the field of audiovisual translation (AVT) (Díaz Cintas & Muñoz Sánchez, 2006), with the early studies focusing on the peculiarities of amateur renditions of audiovisual products in a collaborative setting. Unlike collaborative or community translation, which retain a traditional language service providers’ model (Desjardins, 2017, p. 23), user-generated translation (UGT) can be done at any point, on any platform and encompasses “translation activity that is prompted and motivated by the users themselves (i.e. translation of their own content, their UGC [user-generated content], by themselves, based on their understanding of what ‘good’ or ‘effective’ translation might be)” (original emphasis) (Ibid.).

OSM have become a fruitful ground for the development of various UGT practices, including (but not limited to) the key AVT varieties such as subtitling and revoicing of UGC originally produced for the same environments. OSM translation practices cannot be solely defined as a type of fan activity, since the motivation of OSM user-translators ranges from the affinity to the object of translation to the pursuit of social popularity in the said social ecosystems (Krasnopeyeva, 2017).

Since 1988, AVT has been theorized as a type of constrained translation, which is defined as a type of interlingual transfer “...not only of written texts alone, but of texts in association with other communication media (image, music, oral sources, etc.), [where] the translator’s task is complicated and at the same time constrained by the latter” (Mayoral et al., 1988, p. 356). Dubbing is characterized as having the highest degree of constraint (Ibid.). The AVT-inherent constraints can be formal (all types of synchronisation to visual subtext); content-related (correspondence of the textual and visual subtexts); texture-related (balancing the coherence of the textual and visual subtexts); and semiotic (decoding the meanings of macrosigns and microsigns present in the text and presenting them in the verbal narration, the only narration for the translator) (Chaume, 1998).

Today, we may argue that the relocation of translation practices to the online environments has enabled a rapid increase in the number and variety of constraints. For example, a YouTube page is a multimodal text largely composed of remediated multimodal “packages” (Benson, 2017). This means that the inherent multimodality of audiovisual text is superposed on the multimodality of OSM architecture, which leads to a highly complex nature of OSM AVT.

At the same time, being functional ecosystems, OSM bring a large number of affordances into the picture. In new media studies, the concept of affordances, borrowed from the field of ecological psychology and design studies, is currently considered a key term for analysing the relationship between human and non-human agency, between the media technologies and their users (Hopkins, 2015). Although some technological affordances can potentially be synonymous to constraints for AVT; a number of emergent institutional and social affordances are actively embraced by the user-translators, which will be discussed further in the paper. For example, the norms of the YouTube environment engender peaceful co-existence of both institutionalized subtitling mechanisms (YouTube Translator Environment Toolkit) and video revoicing practices, including voice-oveto r, narration, free commentary and lip-synchronized dubbing. The most common type of revoiced videos, voice-overs or half-dubbed videos, are published on specific UGT-focused channels, with or without the permission of the source content owner. This fact links UGT practices to a wider phenomenon of digital remix culture.

This paper describes a study of UGT in the YouTube ecosystem in light of the constraints and affordances reflected in the user-translators creative strategies.

Problem Statement

Audiovisual translation of specific OSM genres done by users of the OSM for the said OSM holds important implications for the translation studies as a discipline. The qualitative shifts in sociocultural and communication patterns that involve translation practices in online social media environments are currently relatively under-researched. As digital platforms and new types of media engender inherently new multimodality of AVT, it is important to describe the relationship of AVT-inherent constraints against the backdrop of the constraints and affordances of OSM, and identify how they are reflected in the characteristics of the user-translators’ contribution streams at the macro-level, and translator’s creative language use, at the micro-level.

The transparency of online ecosystems allows translated audiovisual products to travel between the geographical borders and linguistic communities. Thus, research into OSM translation is also a way to record the influence of social media on the international circulation of ideas and linguistic change.

Research Questions

This paper explores the following questions.

  • What constraints do the YouTube platform design and the corresponding social network dynamics impose on the user-translators?

  • What affordances of YouTube as a medium add to the unique nature of OSM-specific user-generated AVT?

  • How does the relationship of constraints and affordances influence user-translators’ creative strategies?

Purpose of the Study

The study reported in this paper aims to empirically track the interrelation of constraints and affordances in the case of YouTube-mediated translation practices, and gain insights into how a multifaceted socio-technological environment can shape user-translators’ creative strategies.

Research Methods

The issue raised in this paper constitutes an area of interest developed as part of a wider research project studying the development of UGT in the Russian-language segment of YouTube. Therefore, the following analysis is theoretically informed by the Bourdieusian translation sociology (Hanna, 2016) and draws on Levina and Arriaga’s (2014) understanding of the concept of online field and their model of status production in UGC platforms.

Data collection methods I utilized to build a macro-perspective include longitudinal (15 months) participant observation and documenting the dynamics of the ten most popular UGT-focussed YouTube channels featuring AVT into Russian; discursive analysis of the related meta-texts (meta-translations) and semi-structured email interviews with user-translators. Micro-analysis of translators’ creative language use draws on the comparative analysis of bi-texts featuring original scripts of the most popular videos on the top-10 channels and the corresponding translations into Russian.


To theorize the interplay of constraints and affordances of an OSM environment, in this paper, I document the strategies undertaken by a user-translator running A (For the purposes of the study, names of the channels have been changed.), one of the most popular and fast-growing Russian-language YouTube channels featuring revoiced (half-dubbed) content, originally published in the English-language segment of YouTube. The source videos translated by A represent various non-scripted situations of interpersonal communication where the narrator, or narrators, speak directly to the viewer. With regard to the genre and register of the source videos, they can be characterized as traditional YouTube conversational first person videos, or vlogs, rich in instances of vernacular and creative language use (puns, idioms, cultural references). The videos are thematic and therefore feature discussions of specific topics of a hobbyist nature, which make terminology, jargon and culture-specific concepts one of the key challenges faced by the translator.

The first choice which reflects the influence of the OSM affordances on the translator’s strategies is the preference of voiceover over subtitling. According to Levina & Arriaga’s (2014) model of status production on user-generated content platforms, a producer’s status in the field hierarchy (or their popularity) is dependent, among other things, on the characteristics of their contribution stream (In the Table 1 below, this constitutes a type of OSM-specific environmental constraints). This means that if a contributing user wishes to gain popularity, their content has to appeal to the consumer’s taste. Thanks to another basic affordance of YouTube – the transparency of the field hierarchy (Subscriber and viewing statistics are present on every channel page.) – the aspiring YouTubers can prototypicalize their content to the dominant mode of production. The vlog remains the most popular genre on the platform, with its key characteristics of directness, strategic authenticity and conversational mode of delivery. Voiceover done in an energetic, cheerful and emotional “YouTube voice” (See e.g. https://www.theatlantic.com/technology/archive/2015/12/the-linguistics-of-youtube-voice/418962/), retains orality of the original, therefore makes translation more prototypical and appealing to the viewer. This, in turn, underlines another link between the ATV-inherent texture constraint, OSM environmental constraints and the affordance of channel analytics and statistics. Pre-fabricated orality has always been a challenge for audiovisual translators. While a user-translator’s status in the OSM hierarchy depends on the audience approving of the product, the translators may use their access to channel analytics to attune their linguistic choices to the gender and age spectrum of the potential audience. For example, A adds extra discursive markers to keep the narrative fluent and organic.

The A channel with over 700 thousand subscribers is one of the top five UGT-focused channels in my list. Having successfully obtained permission of the original content’s owner (According to YouTube terms of use, translations are regarded a type of derivative content: “You need a copyright owner’s permission to create new works based on their original content. Derivative works may include fanfiction, sequels, translations, spin-offs, adaptations, etc. You’ll probably want to get legal advice from an expert before uploading videos that are based on the characters, storylines, and other elements of copyright-protected material” (https://support.google.com/youtube/answer/2797449?hl=en).), the channel under consideration publishes Russian language versions of already popular serial English language content created specifically for YouTube. As the user-translator notes, the celebrity status of the original videos plays a part in growing a UGT channel audience. This macro-strategy utilizes the same affordance of transparency of the field hierarchy, together with the affordance of openness. Embracing the affordance of openness also lets the user-translator overcome YouTube legal constraint of copyright protection.

YouTube’s multimodality built on the platforms’ modularity can be regarded as a huge challenge and a functional constraint for the translator: content, texture and semiotic constraints of AVT are doubled, since the user has to tailor his/her translations to the YouTube trends, and keep the strategies attuned to the instant feedback of the audience featured in the comment section. Nevertheless, modularity can be embraced as an affordance as well. In A’s case, every video published on the channel is accompanied by a text in the description box, or a meta-translation, which features the discussion of the translator’s work. A more detailed version of the translation strategies’ description can be found on the user-translator’s blog, which is hyperlinked (connectedness) in the description box as well.

The last creative strategy I would like to discuss in this paper potentially draws on the affordances of the editorial freedom of every YouTube creator and the possibility of content personification. All the AVT-specific constraints are grounded in the understanding that “the translator can translate the text or speech (sometimes not even completely) while all the other media of the message remain untouched” (Mayoral et al., 1988, p. 359) and “words can be changed [...] but images cannot” (Chaume, 1998, p. 21). OSM translation, however, is inherently linked to a wider phenomenon of digital remix culture: AVT products can potentially be manipulated in a variety of ways, ranging from visuals to ideology. A’s translations, however, are extremely loyal to the original, which can also be explained by the interplay of the constraint of community oversight and the affordance of openness: the translator appreciates the original owner’s support and capitalizes on the social ties to the fellow members of the YouTube community.

A variety of technological, social and institutional social media affordances embraced by the user-translators make OSM-specific AVT different from AVT, not embedded into a networked socio-technological environment. Presence of more than one mode of delivery engenders a number of creative strategies, such as extensive commentary in the form of metatextual description and extensive blog entries, which are utilized to explain intertextual links, educate the viewer and build a network of content-specific references to keep the audience engaged. Therefore, some of the OSM affordances cancel out a certain number of AVT-inherent constraints.

The lists of constraints, affordances and translators’ strategies presented in this paper are by no means exhaustive. The reported study is an attempt to illustrate the mechanism of translation mediatizaton: the empirical observations show the way user-generated translation evolves as a practice when amalgamated with the socio-technological ecosystem of social media. The OSM inherent constraints such as status production mechanisms, audience feedback, strong community oversight and legal platform-specific restrictions shape content streams of UGT-focused channels, directing what products are getting translated, and how they are translated. Future research is needed to further define the nature of the interplay between OSM constraints and affordances in light of user-generated translation research.


This research is supported by Russian Science Foundation (RSF) (Project No. 16-18-02032).


