Cross-lingual voice cloning that preserves speaker identity across languages
VALL-E X is a research model from Microsoft that extends the VALL-E language model to perform cross-lingual speech synthesis and voice cloning, reproducing a speaker's voice in a different language from just a short audio prompt. It maintains speaker emotion and acoustic environment across languages. Aimed at researchers and developers advancing multilingual speech AI.
VALL-E X
vallex-demo.github.io
Paid tool. Visit the site to view current pricing plans.