Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model
Toolkit for conversational AI
A PyTorch-based Speech Toolkit
High-Quality Voice Cloning TTS for 600+ Languages
A single Gradio + React WebUI with extensions for ACE-Step
State-of-the-art TTS model under 25MB
Qwen3-omni is a natively end-to-end, omni-modal LLM
On-device Speech Recognition for Apple Silicon
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Synchronized Translation for Videos
An Open Source text-to-speech system built by inverting Whisper
Towards Human-Sounding Speech
Python library and CLI tool to interface with Google Translate
kaldi-asr/kaldi is the official location of the Kaldi project
Open-source industrial-grade ASR models
Foundational model for human-like, expressive TTS
Framework for building realtime multimodal voice AI agents apps
Capable of understanding text, audio, vision, video
GLM-4-Voice | End-to-End Chinese-English Conversational Model
A lightweight text-to-speech model with zero-shot voice cloning
Free, high-quality text-to-speech API endpoint to replace OpenAI
A cross-platform software for text translation and recognition
MARS5 speech model (TTS) from CAMB.AI
A speech-text foundation model for real time dialogue
Framework for building real-time voice and multimodal AI agents