ML for Speech

Applying machine learning to speech

About

ML for Speech aims to bring machine learning to the speech and audio. We're building SpeechToolkit, an all-in-one toolkit for ML in speech!

SpeechToolkit

SpeechToolkit is an all-in-one, end-to-end toolkit for text-to-speech, automatic speech recognition, voice conversion, and more. It provides a unified framework to use these models. Instead of having to write many lines of customized code for each model, SpeechToolkit allows you to use a unified Python API.

SpeechToolkit supports many different third-party open-access models, as well as some models developed by ML for Speech. While these models are mostly available through the SpeechToolkit package, we've packaged many of these models individually if you don't want to use the SpeechToolkit library.

You can learn more about SpeechToolkit and how to install it on the SpeechToolkit website.

Packages

While these packages are mostly integrated into SpeechToolkit, you can still use and install them individually.

LVC

Our unofficial package for LVC-VC

GitHub Repository

NS3VC (NaturalSpeech3 Voice Conversion)

Our unofficial package for Amphion's NaturalSpeech3 Voice Conversion implementation.

GitHub Repository