top of page

DEFINITIONS for VOICE ACTORS

Voice actors today face a credibility challenge: how do you prove your work is authentically yours when AI can clone voices instantly? This standard draws a clear line between human performance and machine generation.

QUICK REFERENCE

  • VerifiedHuman label requires VH3 or higher on the Human-AI Spectrum

  • Essential question: "Who gave voice to this sound?"

  • Acceptable AI use: Noise reduction, audio cleanup, technical post-processing

  • Not acceptable: AI-generated or AI-cloned voices you claim as your performance

  • Human voice remains distinct from AI simulation

FOREWORD

The most important definition in the standard is vocal work: the selection and arrangement of essential vocal elements in a performance to be heard—narration, dialogue, or voice-over that has been performed, recorded, or broadcast.

 

This human creativity standard helps voice actors prove their work is not AI through human-made content certification that validates essential vocalization.

 

Voice acting is unique. Unlike writers, voice actors typically don't write their scripts—they perform them. The craft lies in bringing words to life through vocal performance. Voiceover must be heard to be experienced; even closed captions can only reference the words, not the inflection, emotion, and nuance that voice actors provide.

 

The human voice remains distinct from AI simulation. Modern voice production involves technical processes in which AI tools may be employed—from studio recording enhancement to post-production processing—but these tools support, rather than replace, the voice actor's performance. Our spectrum helps voice actors clearly communicate how their work relates to these AI-assisted production elements while maintaining the fundamental distinction between human- and machine-generated voices.

 

HUMAN-AI COLLABORATION LEVELS
 

Our 5-point Human-AI Spectrum provides AI transparency certification and human authenticity verification for voice actors. It shows exactly how much of the work is human-performed versus AI-assisted in production. All VerifiedHuman labeled work must meet or exceed Level 3 (VH3), meaning the voice actor performed the vocal work.

VH5 - >99% Human-Performed (<1% AI-Generated)

  • Vocal content performed and produced entirely without AI assistance

  • Includes unique vocal nuances, emotional depth, and personalized expression

  • No traces of automated processes in production

VH4 - Mostly Human-Performed

  • Predominantly human-performed with minimal AI assistance in production

 

  • May use AI for technical enhancements (noise reduction, audio cleanup)

  • Human vocal expression dominates

VH3 - Balanced Human-AI Collaboration

  • Equal partnership between human performance and AI-assisted production

  • Harmonious integration of human voice and AI-enhanced technical elements

  • Significant human performance input maintained

Work below VH3 is not eligible for the VerifiedHuman label.

In the spirit of this standard, the question remains: Did you perform it, or did AI generate it? We believe the voice actor's intention to create and present vocal work as their own should always remain with them. Therefore, we urge all voice actors to authenticate their work in the performance process to ensure that their final product is unique and distinctive.


We acknowledge the hard work of skilled professionals - women and men who have honed their skills and mastered this craft in various languages. We appreciate your human voices and are grateful to be able to help you mark them as appropriately yours.

DEFINITIONS

Here are definitions of words and ideas used in the Standard for Voice Actors.

Standard

A specific description of human behavior

 

Voice Actor/Voiceover Artist

The narrator/performer of vocal work
 

Represent

Present, share, or show work with others

Vocal work

The narration or production of essential vocal elements for live performance, broadcast, or recording

Team

A group of people working together

 

Intellectual property

A work or invention that is the result of creativity, such as a manuscript or a design, to which one has rights and for which one may apply for a patent, copyright, trademark, etc.

 

Essential

The fundamental elements or characteristics of something

 

Essentially vocalized

When a voice actor narrates or produces essential vocal elements in a meaningful way to create vocal work (see Essential Vocalization below)

Essential vocal elements

Sounds made by the vocal tract–can be heard as talking, singing, laughing, crying, screaming, humming, mumbling, or shouting

Human

Noun: a human person; adjective: from a human person

 

Generative AI

Shorter definition–machines that create novel (new) content

Longer definition–generative artificial intelligence (AI) describes algorithms (such as ChatGPT) that can be used to create new content, including audio, code, images, text, simulations, and videos.

 

Machine learning

"Systems that act like humans"

 

Other generative processes

Other processes involving AI or machine learning to create novel (new) content

OTHER DEFINITIONS

Other definitions of words and ideas related to the Standard for Voice Actors are here.

Accent

The prominence of a syllable; characteristic pronunciations of one spoken language that can be heard in another

Artificial Intelligence (AI)

A field that combines computer science and robust datasets to enable problem-solving. It also encompasses the machine learning and deep learning subfields, which are frequently mentioned in conjunction with artificial intelligence. These disciplines involve AI algorithms that aim to build expert systems capable of making predictions or classifications from input data.*

 

​AI language modeling (or Large Language Modeling (LLM))

Systems that can use natural language text from large amounts of data. Large language models use deep neural networks, such as transformers, to learn from billions or trillions of words and to produce texts on any topic or domain. Large language models can also perform a range of natural language tasks, including classification, summarization, translation, generation, and dialogue. Some examples of large language models are ChatGPT, Claude, and Gemini.**


Audio output

The production or reproduction of sound, live or recorded

Emphasis

Special and significant stress of voice laid on particular words or syllables, stress laid on particular words by means of position or repetition

Interpretation

Way of understanding or explaining the meaning of something

Language

Communication by voice in the distinctively human manner, speech; a body of words and the systems for their use common to people who are of the same community or nation, the same geographical area, or the same cultural tradition

 

Inflection

The rise and fall of pitch in the voice; changes in tone used to convey meaning, emotion, or emphasis. Inflection helps distinguish questions from statements, conveys sarcasm or sincerity, and brings emotional depth to vocal performance.

Morphological vocal processing

When the features and structures of a vocal sound's characteristics are described and processed as words


Pause

A temporary stop or rest

Personal, original idea

An idea representing a specific human being's unique insight or experience in the world

Pitch

Height or depth of a tone or sound, depending upon the relative rapidity of the vibrations by which it is produced

 

Pronunciation

Producing the sounds of speech, including articulation, stress, and intonation, often concerning a standard of correctness or acceptability

Rate/Speed

A measure, quantity, or frequency, typically one measured against some other quantity or measure, rate of motion, or progress

Representation and description

Representing a sound in a way that can be analyzed by a computer, like numbers or words that have been assigned to specific sounds and features of recorded sounds

Sound

Vibrations transmitted through the air or other medium experienced through hearing​

Text-to-voice generation

Also known as text-to-speech (TTS), using written or spoken words to prompt an AI generator to create vocal work

Timbre

The characteristic quality of sound, independent of pitch or volume, from which the manner of production can be inferred, dependent on the relative components of resonant frequencies

Tone

A sound considered concerning its quality, pitch, strength, and source; the quality or character of a sound

 

Values

Principles or standards of behavior
 

Values-based

Relating to principles, values, or ethical assumptions that motivate human behavior​

Voiceover

The voice of an offscreen narrator, announcer, speaker, or reader, as in a commercial, using such a voice

COMMON USES

Here are commonly accepted ways AI is used in voice production.

 

Audio cleanup and noise reduction uses AI to remove background noise, breathing sounds, mouth clicks, and environmental interference from recordings. This is widely accepted.

 

Pitch and timing correction uses AI to adjust vocal pitch or timing while maintaining the natural quality of the human voice. Standard in professional production.

 

Audio mastering and enhancement uses AI to balance levels, apply EQ, compression, and other effects to improve overall sound quality. Industry standard.

 

Script generation and prompts where AI helps generate ideas or draft scripts that the voice actor then performs in their own voice. The performance remains human.

 

Dubbing and lip-sync timing uses AI to help match vocal timing to visual content across languages. The voice actor still performs all dialogue.

INTERPRETATION

Here are questions voice actors commonly face when working with AI tools.

 

Q: Can I use AI to clean up my recordings and still qualify for VerifiedHuman?

 

ACCEPTABLE: Using AI for technical post-processing (noise reduction, EQ, compression, mastering) while you performed the entire vocal work. AI enhanced your recording; you created the performance.

 

GRAY AREA: Using heavy pitch correction or vocal processing that significantly alters your natural voice. Ask yourself: Is this still recognizably my performance?

 

NOT ACCEPTABLE: Using AI voice cloning or synthesis to generate vocal performances. If AI generated the voice, you didn't perform it—even if you wrote the script or provided the voice sample.

 

The question remains: Who performed it?

REAL-WORLD SCENARIO: Audiobook Narration with AI Enhancement

Sarah narrates a 10-hour audiobook. She uses AI to:

  • Remove background noise and mouth clicks (audio cleanup)

  • Normalize volume levels across chapters (mastering)

  • Reduce echo in her home studio (acoustic correction)

  • Apply light compression for consistency (production enhancement)

Sarah performed every word, character voice, and emotional inflection herself.

 

VERDICT: VH4-VH5 (Mostly/Entirely Human-Performed)

 

WHY IT QUALIFIES: AI assisted with technical post-production, but Sarah performed all vocal work. She is the essential vocalizer.

ESSENTIAL VOCALIZATION

In vocal performance, artists use their minds and distinctive physical and verbal traits to bring human expressions to life, crafting unique soundscapes. These come alive in movies, radio, audiobooks, and more. We establish essential human vocalization with a simple question.


The essential question of vocalization is: Who (or what) gave voice to this sound?​


RATIONALE
We hear and experience spoken vocalizations, either human voices or voices simulated by a machine.​
The result is vocal work when a human's voice can be experienced live or in recorded media.

ASSUMPTION OF ESSENTIAL HUMAN VOCALIZATION

If a human (and not a machine) gives voice to words and sounds, then a human is the essential vocalizer.

Voice Actors use VerifiedHuman's human-made content certification to prove their work is not AI-generated. Our human creativity standard, established in April 2023, provides AI transparency certification and human authenticity verification for all vocal work. Free to join. 190+ creators certified worldwide.

500x500
bottom of page