Go to Menu
Celebrating 25 Years of Voice! 🎉

Ethical AI at ReadSpeaker: Best Practices for the AI Voice Industry

What does ethical AI look like in today’s text-to-speech (TTS) industry? Find out with ReadSpeaker’s ethical guidelines for AI voice creation.

April 29, 2024 by Gaea Vilage
Woman wearing headphones listening to ethical AI at ReadSpeaker

The debate over AI ethics rages on, but some things are beyond dispute whether you’re dealing with neural networks or fingerpaints.

You shouldn’t take someone else’s stuff without permission. You shouldn’t promise one thing then do another. You shouldn’t make a product that hurts people.

At ReadSpeaker, we’ve been at the forefront of AI voice technology from the very start. We’ve considered the ethical implications of voice synthesis in all its permutations. Recently, we’ve watched as the industry’s eagerness to exploit a powerful new technology has begun to outpace its conscience.

In this article, we’d like to share our perspective on AI ethics in the text-to-speech (TTS) business.

Looking for an AI voice generator with strong ethical guardrails?

Learn about ReadSpeaker’s neural TTS solutions
Asian man smiling

What an AI Voice Really Is

First—just to make sure we’re all on the same page—let’s clarify what makes an AI voice an “AI voice.” Here’s a quick definition:

An AI voice is a synthetic model of human speech built on deep neural networks.

A deep neural network (DNN) is a machine learning architecture based on the human brain. It uses multiple layers of interconnected processing units—artificial neurons—to learn complex patterns from training data.

When you train the right DNN on a human voice, it produces audio signals that mimic that voice. We call the result a neural voice or an AI voice—“AI” because DNNs are a form of artificial intelligence.

The important takeaway here is that there’s a real person behind every AI voice. Synthetic voices are a distinctly personal form of intellectual property. Unfortunately, in the great AI goldrush, not every TTS provider is respecting that fact.

This disconnect may boil down to a common business model in the AI voice industry: Unleashing an AI voice on the open internet, as self-service AI voice generators do, fails to protect the rights of voice talent (and other stakeholders, as we’ll explain).

B2B vs. B2C AI Voice Generators and the Rights of Voice Actors

Self-service AI voice providers operate on a business-to-consumer (B2C) model. They sell AI voices to anyone. At ReadSpeaker, we only operate business-to-business (B2B). We work with other companies, not individual consumers.

This B2B model allows us to protect voice actors in ways B2C providers simply can’t. Our contractual agreements with voice actors and AI voice users ensure that a vocal likeness can only appear in narrow, approved use cases. We’ll unpack this idea in the paragraphs that follow.

Unethical Uses of AI Voice Synthesis

What determines whether an AI voice is “ethical” or “unethical?” Two things: How it’s built and how it’s used. Let’s call these the upstream and downstream practices that become points of ethical crisis.

Upstream ethical violations revolve around data collection.

Remember, neural voices sound like the voice recordings you feed them for training. Where you get those recordings matters a lot.

In a world of podcasts and audiobooks, there’s plenty of data out there. It is possible to scrape audio data from any source, creating an AI voice without the speaker’s knowledge or consent. That’s clearly unethical—but people are doing it.

To see if a TTS provider is using AI ethically, start with a simple question: “Where did you get your data?”

There’s a second way TTS companies set the stage for unethical—and even illegal—data collection. Self-service B2C AI voice generators, or voice cloning services, allow users to create a synthetic voice based on their own audio recordings.

These tools allow anyone to clone a voice, sometimes with just a few seconds of audio data. Most of our voices are all over social media. They’re also on an unknowable number of distant servers, as a consequence of virtual assistants, smart speakers, and voice-enabled apps that may record our interactions. In other words, we’re all vulnerable to predatory voice cloning with these tools on the loose.

Quick, homemade AI voices will never sound terrific (that takes a lot more data). But they’re good enough for political deepfakes and impersonation scams.

We’ll cover these topics more in our article about the ethics of voice cloning. For now, the key point is that creating an AI voice with unauthorized training data is almost always wrong.

Downstream ethical violations involve unauthorized use of a synthetic voice.

There are three primary stakeholders in the delivery of neural TTS:

  1. Voice talent: The speaker behind the training data
  2. The AI voice creator: A TTS provider like ReadSpeaker
  3. The TTS user: The organization that delivers synthetic speech to its audience

All three of these stakeholders should agree on appropriate uses of an AI voice. Deploying an AI voice outside of these approved scenarios—the “unauthorized use” we keep mentioning—can create serious harm. That harm hits each stakeholder differently.

How Unauthorized Use of AI Voices Harms…

1. Voice Talent

Voice actors supply the training data for most commercial TTS voices. If an AI voice creator clones an actor’s voice and doesn’t tightly control deployment, that actor can be ruined. After all, why hire someone you can emulate for free?

“My voice is who I am, but it’s also my livelihood,” a working voice actor told us. “If you take that, you take my income. If someone steals my voice, that’s it; I’m done.”

“If you take my voice, you take my income.”

Without controlled deployment, voice actors in the TTS industry also face the risk of having their voices used in content they don’t approve of, from adult videos to hate speech. That can lead to moral injury as well as loss of income. You can see how vulnerable voice actors are to unauthorized TTS usage.

2. AI Voice Creators

Ethical creators of AI voices—including ReadSpeaker—are also harmed by uncontrolled AI voice proliferation. It takes time, money, and much vigilance to deploy AI voices ethically.

Companies that don’t play by the rules seize an unfair competitive advantage at the same time they’re harming voice talent—and, potentially, the very customers they sell to.

3. TTS Users

This is the organization that delivers the AI voice to consumers. Companies might use an AI voice to improve digital accessibility, produce e-learning content, announce a delay on a train, or power a virtual assistant (to name just a few examples).

It doesn’t matter how you use it; that AI voice becomes part of your brand identity. Imagine the damage if that exact same voice shows up in illicit or illegal content. You may even be subjected to legal jeopardy if your provider also made upstream ethical violations.

These ethical complaints, both upstream and down, aren’t speculation. People are out there acting unethically on both fronts.

At ReadSpeaker, we operate differently. We place ethics at the core of every decision we make.

Here’s how we create AI voices while protecting our stakeholders—and how we recommend others in the industry do the same.

Our goal at ReadSpeaker is to deliver the most lifelike TTS voices available while preventing any form of abuse or harm involving our work.

Based on the practices we’ve developed across more than 20 years in the industry—and while serving over 12,000 customers all over the world—here are our ethical guidelines for AI voice providers.

ReadSpeaker's ethical guidelines for AI voice providers

1. Generate your own training data.

When you train your AI voice models, never use voice recordings without the permission of the speaker and/or their legal representative and IP rights holder. You may also need the approval of other creators, like audio engineers and voice coaches. Don’t scrape data from any source.

The best practice, both for quality and ethics, is to generate your own training data by creating original voice recordings. This gives all contributors a chance to agree on approved uses for the AI voice you’re creating. This step is essential for downstream protections, as we’ll see.

2. Always sign contracts with voice talent.

Contracts keep expectations crystal clear for all stakeholders. They’re crucial for protecting voice talent. Without voice talent, there’s no AI voice, so there’s a practical as well as an ethical dimension to this rule.

Your voice contract can and should include carve-outs for conflicts of interest. If a voice actor does a lot of work in radio, for instance, they may not want their synthetic voice used in radio commercials. Use this upstream contract to ensure downstream protections are in place.

3. Always sign contracts with AI voice users.

A contract with voice talent establishes the approved uses of an AI voice. A contract with the user of that AI voice—the TTS provider’s customer—enforces those approved uses.

The company using the AI voice also deserves protections. Companies don’t want a branded asset showing up elsewhere, for example. This downstream contract sets these rules in stone, preventing harm to all parties.

Both upstream and downstream contracts help to inform your practice of our next guideline.

4. Maintain control over AI voice deployment.

It’s the TTS provider’s responsibility to keep AI voices restricted to approved channels. Your role does not end with the creation of the voice; you must also control the systems by which your voices are delivered.

That’s the only way you can hold up your end of the contracts. If your voice ends up in the wrong hands, you can’t stop it from showing up in unauthorized use cases.

Build these protections into your technology. At ReadSpeaker, it’s technologically impossible for someone who doesn’t have a contract with us to use one of our voices.

We recommend this practice to all AI voice providers, for the protection of voice actors, TTS users, and all of society.

5. Build a business model around ethical behavior, not the other way around.

Some business models in the AI voice space make it difficult to follow the previous four guidelines. If you can’t protect your vendors and customers, however, it’s better to rethink your systems before launching. That’s true in any industry.

It’s not enough to offer warnings or terms of service, simply asking consumers not to abuse an AI voice generator. Protection from abuse must be built into the technology itself. For self-service voice platforms, that could involve digital watermarks, auto-rejection for well-known voices, and violation-reporting channels.

The best practice, however, is to avoid open access to AI voice generators in the first place. There’s simply no way to protect your stakeholders—including society at large—otherwise.

Ethical AI at ReadSpeaker: How Safety and Quality Support One Another

These guidelines are based on how we operate at ReadSpeaker. We’ve been a leader in speech synthesis technology for over two decades, and were among the first to provide commercial AI voices.

From the start, we applied our ethics-informed, contract-based procedures to AI voice synthesis. That’s made us a trusted name—not just among our customers, but in the voice acting community as well.

We’re proud to be known as a TTS company that treats voice talent right. That reputation has enabled us to do exciting work. For example, in 2022 we worked with actor Giancarlo Esposito to develop the exclusive AI voice for Sonos’ voice assistant, Sonos Voice Control.

That’s just one example of how ethical business translates into good business at ReadSpeaker.

Following our ethical guidelines also leads to higher-quality AI voices. We create original training data to protect the rights of stakeholders, it’s true. But we also do it because it leads to a better product.

Every ReadSpeaker neural TTS voice starts with an involved voice recording process, including:

  • Custom TTS scripts
  • Top-notch voice talent
  • Expert voice coaching
  • Professional studios
  • Careful editing

It’s not the quickest, but our process helps us protect voice actors. It also makes the best AI voices available. At ReadSpeaker, ethics and quality go hand in hand.

We invite every TTS provider to adopt ethical AI guidelines like ours so they can see the same benefit.

Don’t let ethical worries keep you from reaping the benefits of AI-powered TTS. Contact ReadSpeaker to get top-quality, ethically produced AI voices today.

Contact us
A smiling woman with glasses holds a tablet
Related articles