Localization Glossary

The Localization Committee has put together a glossary of terms commonly used within the industry.


Glossary by Category

Following are categorized links to Terms, Descriptions, and Definitions:

Download the Complete List (Updated as Lists are Updated)

Please submit updates, suggestions, new words, and definitions here.


Glossary Terms


ADR- “Automated Dialogue Replacement” or “Additional Dialog Recording”. The process of re-recording dialogue by the original actor (or a replacement actor) after the filming process to improve audio quality or make changes to the originally scripted dialog. Sometimes also referred to as “Additional Dialog Recording.”

Adapter/Adaptation – Once a script is translated, it must be adapted to take into account synchronization and cultural adaptation so the audience is given the impression that the actor is speaking in their language. A global audience deserves to experience dubbed content in a way that authentically represents their culture, traditions, values, and native language.

Algorithm – A procedure or formula for solving a mathematical problem, generally based on a series of specific instructions.

Artificial intelligence – refers to developing computer systems capable of performing tasks that usually require human intelligence.

Assignment of rights – A contract between a performer and a content producer (usually with the LSP as an intermediary) that assigns the usage rights for the performer’s work on a given project. (e.g., the actor’s performance can be used for Theatrical, Home Entertainment, Marketing, Social media, Streaming, etc.) for the “Fee being paid for the performance. Often, a point of negotiation between the actor and the producer/distributor as to the scope and duration of the ” Assignment of Rights.”

Audio description – A narration track to be played with a program to describe and provide information on visual elements for the visually impaired.

Audio layback – In the domain of audio/video production layback is the process of laying audio back in to video for the final video product. During video projects it is common for the audio and video to be edited separately (sometimes in separate facilities in different states or countries).

Audio lip sync – The process of achieving lip sync through manipulation of the audio track, whether by means of AI or manual editing.

Audio mixing – Process of combining and adjusting different audio elements, including dialogue, music, and sound effects; balancing levels to obtain a natural and clear sounding mix to meet the desired quality and cultural standards. Audio elements may also be positioned spatially to create a more immersive experience. When mixing the localised voiced with the M&E, the audio engineer applies for example the proper voice treatments and panning and perspectives to the dialogues matching the original version.


Back Translation – Also called reverse translation, is the process of re-translating content from the target language back to its source language in literal terms.


Casting – A dubbing director will choose voice actors to play the roles in a production (movie or television). This voice actor will likely have a similar sound and tone as the original actor and be a native speaker of the dubbed language. The cast actor will have a similar tone, texture, age, and weight to the original.

Cast list – Complete list of all actors (voice or live action) that perform in a program.

Creative letter / dubbing guide – This is the bible of the project for localization purposes. It describes the dubbing strategy. Includes a synopsis of the story, description of the main characters, their voice types and some information of the original actors, translation and adaptation notes describing important cultural matters and slang, vocals and ditties information, foreign language notes, casting list and word count and also provides legal information and M&E notes.

Conform- The process of synchronizing video, audio, and/or timed text.

Croud- Like Walla, but may require localization (see SMPTE ST377-41)


Data – A collection of information recorded on a specific medium in a form suitable for permanent storage, transmission, processing, and analysis.

Deepfake – Manipulated or synthetically generated media content (video, audio, image, text, etc.), generated to seem authentic to the source using artificial intelligence or machine learning and without the informed consent of the subject.

Delivery – How an actor performs their lines. Includes energy, pace, force, tone, feeling.

Dialogue – The words spoken by actors in a movie or television program included in a script.

Director (dubbing) – The director in the territory is hired to make creative choices and direct the voice actors.

Ditty – A short, simple song is usually sung or hummed by a character in the program. Generally differentiated from a “song” due to no music composition accompanying the humming or lyrics. Generally would be re-recorded by a voice actor for a dub

DME stems – “Dialogue, Music, and Sound Effects”: Audio stems that contain all these elements separated. Allows to make adjustments to specific audio elements without affection the entire mix.


Efforts – Missing additional sounds that have to be worked into an audio mix (“efforts”, various production sounds that end up on the dialg track, etc.)

Established voice – Local territory actor who has consistently recorded dubbed audio for specific original version actor(s).


Foley – Sound effects created that mirror on-screen action. in the context of Localization, it can be recorded to replace, or record sounds that are culturally specific, but more often when the quality of the original audio elements in the source content does not meet the desired standards for the localized version.

Forced narrative – A timed text file that translates on-screen text, or foreign dialogue different from the original language, that is plot pertinent and allows the audience to understand a scene. On-screen text may include signs, plaques, or any type of wording shown in a scene.

Full mix – Final Audio Mix that combines all necessary audio elements, including dialogue, music, sound effects, and any other required audio components; balanced and adjusted to achieve the desired sound quality and overall audio experience required in formats such as stereo and surround sound.


Generative AI – A branch in the field of artificial intelligence that focuses on creating new and original content based on pre-trained data models.


Home studio – A voice actor may have designed and built a studio in their home to allow them to record remotely. These home studios typically meet similar requirements that a normal studio would have, most importantly, having no external noise that could be picked up in the recording. Pre-fabricated home studios called “whisper booths” are commonly used.


Insert list – Documentation of on-screen text that will require translation to the target language for graphic or forced narrative inclusion.



KNP – “Key Names and Phrases” Typically created so any re-occurring names, phrases, or places are translated in a consistent manner so that when appearing throughout a movie or over several episodes, the audience will recognize them.


Labials – Labials are lip movements involved in forming speech. The most noticeable artifact is lips touching during the vocalization of a dialogue, such as occurs with the consonants M, B, F, P, and V. The term “Labials” included also includes visual expressions of speech involving teeth (dentolabial) and tongue.

Lectoring (Lektoring) – A dubbing style mainly used in Poland. Utilizing only one actor who speaks all the lines of every actor in the program, reading all the translated dialogue.

Line – A sentence part of the dialogue of a character.

Lip sync – Matching dialogue to mouth movement of character speaking. This has to match the amount of syllables in the dialogue and the labials.

Lip sync adaptation – The process of achieving lip sync through changing the text of the dialogue.

Localization list – A list that describes the strategy to handle specific lines, vocals, or onscreen text.

Loop – A sequence that includes the line or lines to be recorded in a take.  


M&E – “Music and Effects” track or tracks pre-mixed without the original dialogue to be used as an international bed to do a final mix with the localized dialogue.

M&E helper/option – Partner audio tracks to be used with M&E. Audio that is  not included in M&E mix; additional material such as foreign dialogue, vocals/singing, and efforts as needed to allow flexibility for the localization mixer

M&E map – Document that details mix instructions for a feature film. The document includes plug-in details and notes on how to use it for foreign version mixes. Document details all material across optional M&E tracks and any filmmaker notes on usage.

Machine learning – Algorithms with a certain unsupervised learning capacity. Machine learning is more advanced than rule-based AI and is generally based on the comparison of data rather than prior instructions. The technology relies heavily on statistics.

Media (video/image/audio/text) synthesis – A process of producing new media content with the help of generative AI.

Music cue sheet – A log of all the music used in a production.


Narration – Spoken commentary that describes the story.

Nearfield mix – aka “Home Theater Mix” A rebalanced mix of the original theatrical presentation intended for the home listening environment. Typically lowering music and effects, and raising/balancing dialogue to avoid volume manipulation in the home while watching Bluray/streaming content.


OV- Can refer to Original Version or Original Voice. When related to audio, it refers to the mixed audio of original voice + original M&E.


Phonetic dub – A dubbing style used in Japanese. When a foreign language (not English) is spoken in OV audio, a voice talent pronounces the sound how it’s heard. The phonetic dub is used when a) a separate foreign dialogue track is not available, b) OV audio is overlapping with English dialogues, and c) voice match purpose. e.g.) The Italian word “Ciao” is “Chau” in English.

Pick-up – Added dialogue recording required due to alternate versions, edits, and/or missing from the original recording.

Pivot language – Translated language to be used as a reference for adaptation. English is a common pivot language when you translate a foreign language to English and then to another foreign language.

Print master – Also known as ‘PM’, the Print Master refers to the final, approved version of the dubbed of localized audio track. Complete and ready for distribution, the ‘PM’ includes all the necessary audio elements, such as dialogue, music, sound effects, and any other components required. It has undergone quality control, editing, and mixing to ensure that it meets the desired audio quality and specifications for the intended audience and is the reference for replication and distribution in the target language or region. 



Raw dialogue tracks – Actor’s recordings without any mixing applied.

Remote dubbing – Dubbing done via a platform that connects actors and directors from location other than a studio.

Rights clearances – Legal authorization to use specific elements of the original content in the localized version like human vocalizations or dialogues or to translate songs, poems or other original literature.

Rythmo band – Stream that displays the dialogue matching the rhythm and cadence of the original to facilitate the delivery and lip sync.


Sensitivity events list – List of specific visual, written, or spoken situations in the story that could represent a legal risk or a cultural discomfort that might need to be addressed in the adaptation or will require advice or authorization from the client on how to handle them.

Signatory – Entity or person authorized to sign the agreements. In our industry it would be the agreements with the unions or talents.

Speech separation – The process of separating dialog from M&E using AI or other means to achieve an isolated Dialog Stem and M&E to be used for various purposes.

Softened dialogue – Translation and adaptation to use alternate dialogue in the localized dub or subtitle when offensive material may be present, including swearing, off-color jokes, etc.

Sound engineer – During a dubbing session, a sound engineer operates the software that captures the voice actor’s performance in a dubbing studio. The engineer controls the recording, editorial, and playback of the recordings for the director.

Stakeholders in the entertainment and localization industry – Entertainment platforms and distribution channels that support the film, television, audio, consumer electronics, and IT industries; actors, writers, and guilds; the audience.

Star talent – Local territory performer who is considered famous/well-known in a territory as a performer (musician, actor, online personality, etc.). Cast to encourage local interest in a film.

Stems- Discrete or grouped collection of audio sources mixed together. These could be dialogue, vocal, or effects stems.

Synthesis – An umbrella term for artificially created media, including text, speech image, and video.

Synthetic media in entertainment and localization – The use of AI and ML-generated media to support and enhance existing processes of the entertainment industry and to develop new ones.

Synthetic media 1 – An all-encompassing term for different sorts of automatic and artificial media productions, including video, audio, and text manipulation. Synthetic media is commonly based on artificial intelligence software.

Synthetic media 2 – Refers to any media (voice, image, video, text, audio) created or modified by algorithmic means through the use of generative AI algorithms.


Target language – Language in which the localization is required.

Text synthesis methods – The two main methods in which new text is generated are:

  • STT (speech to text) – an AI-generated text that represents a human dialogue.
  • MTT (machine-translated text) – an AI-generated text in a certain language representing text in another language.

Transcription – Process where the audio is converted to text.

Translator – A professional who translates a project from one language to another.


UN style – Translation style commonly used in the United Nations meetings where dialogue is translated hearing following the original audio. In our industry this known as Newsreel, UN or VO dubbing.


Video lip sync – The process of achieving lip sync through manipulation of the video track, whether by means of AI of manual editing.

Virtual reality – Technology that collects, analyses, and applies data and uses this to place artificial layers over the physical reality in order to create ‘hybrid’ worlds. These are simultaneously physical and virtual.

Vocals – Recordings that require singing

Voice actor – Actor who performs exclusively with their voice for audiovisual projects.

Voice cloning – A process that uses generative AI technology to create a synthetic voice, either in real-time or offline. This method makes it possible to simulate the unique characteristics of a person’s speech, with the objective of completely matching the original target voice.

Voice-Over – Dubbing style used predominantly for documentaries, reality and unscripted content. The dubbed dialogue is recorded with no lip sync, reactions or vocals, starting slightly later than the domestic and end earlier or at the same time of the original. In the mix, the original dialogue is placed in the background, but it can still be heard when the speaker is seen on camera, while the dubbed language is placed in the foreground as an interpretation of what is being said.

Voice synthesis – A voice created by a computer through means of generative AI

Voice synthesis methods – the two main methods of voice synthesis are:

  • TTS (text to speech) – an AI-generated speech from a textual representation using machine learning methods.
  • STS (speech to speech) – an AI-generated speech using speech (not text) as the source to generate speech in another voice.

Voice test kit – Set of materials prepared with the purpose of recording auditions for specific characters. Also referred to as “VTK”.


Walla – A term used in radio, film, and television sound recording to describe the murmur of a crowd in the background. Walla may need localization if the audio volume of Walla in M&E is too low or when some English dialogue in the Walla is noticeable.