Examples
Text-to-Speech
from speechtoolkit.tts import SingleSpeakerStyleTTS2Model
model = SingleSpeakerStyleTTS2Model()
model.infer_to_file('Hello, this is a test', 'out.wav')
Multi-speaker StyleTTS 2 with zero-shot voice cloning:
from speechtoolkit.tts import MultiSpeakerStyleTTS2Model
model = MultiSpeakerStyleTTS2Model()
model.infer_to_file('Hello, this is a test', 'sample.wav', 'out.wav')
Automatic Speech Recognition
from speechtoolkit.asr import WhisperModel
model = WhisperModel()
model.infer_file('audio.wav')
With a larger model:
from speechtoolkit.asr import WhisperModel
model = WhisperModel('medium')
model.infer_file('audio.wav')
With DistilWhisper:
from speechtoolkit.asr import DistilWhisperModel
model = DistilWhisperModel()
model.infer_file('audio.wav')
Voice Conversion
from speechtoolkit.vc import LVC
vc = LVC(device='auto')
vc.infer_file(
'original.wav',
'sample.wav',
'out.wav'
)
Language Classification
from speechtoolkit.classification.languageclassification import WhisperLanguageClassifierModel
lc = WhisperLanguageClassifierModel()
lc.infer_file('audio.wav') # 'en'
Accent Classification
from speechtoolkit.classification.accentclassification import EdAccAccentClassifierModel
ac = EdAccAccentClassifierModel()
ac.infer_file('audio.wav') # 'Mainstream US English'