What I was told

One thing I wish someone had told me when I rehomed the dog was this: Don’t believe what the previous owners tell you. It’s not necessarily that they are lying. But they don’t know this dog. That’s…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




The Intuition Behind Voice Cloning with 5 Seconds of Audio

A guide to the paper “Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis”

Nobody wants to listen to a robotic text-to-speech (TTS) program drone on about whatever.

The authors propose a new technique (often called Speech Vector to TTS, or SV2TTS) for taking a few seconds of a sample voice, and then generating completely new audio samples in that same style of voice.

How do the authors do this?

Let’s take a look.

In this article, we will cover the intuition behind the voice cloning technique developed by Jia, Zhang, and Weiss et al. We will not be implementing any code this time.

The SV2TTS model is composed of three parts, each trained individually.

This allows each part to be trained on independent data, reducing the need to obtain high quality, multispeaker data.

The first part of the SV2TTS model is the speaker encoder.

The speaker encoder’s job is to take some input audio (encoded as mel spectrogram frames), of a given speaker, and output an embedding that captures “how the speaker sounds.”

The speaker encoder does not care about the words the speaker is saying, or about any noise in the background, all it cares about is the voice of the speaker, e.g., high/low pitched voice, accent, tone, etc.

All of these features are combined into a low dimensional vector, known formally as d-vector, or informally as the speaker embedding.

As a result, utterances spoken by the same speaker will be close to each other in the speaker embedding, while…

Add a comment

Related posts:

The Role of Sports in Youth Development

Sports play a vital role in the holistic development of young individuals. Beyond the physical benefits, youth participation in sports contributes to their overall growth, instilling important values…

MVP is not enough. Your product must be AWESOME!

Building a Minimum Viable Product (MVP) is a task not to be taken lightly. The startup or enterprise business must clearly understand the goal of doing it, the possible limitations and the need to…