Showing 316 open source projects for "speech"

View related business solutions
  • Earn up to 15% annual interest with Nexo. Icon
    Earn up to 15% annual interest with Nexo.

    Access competitive interest rates on your digital assets.

    Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • Earn up to 15% annual interest with Nexo. Icon
    Earn up to 15% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 1
    Hugging Face - Speech To Speech

    Hugging Face - Speech To Speech

    Open speech-to-speech models and pipelines by Hugging Face toolkit AI

    This project from Hugging Face focuses on enabling direct speech-to-speech processing using modern machine learning models. It provides tools and reference implementations that allow audio input to be transformed into audio output without requiring an intermediate text representation. Hugging Face - Speech To Speech builds on recent advances in speech modeling, combining components such as speech recognition, translation, and synthesis into unified pipelines. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Fish Speech

    Fish Speech

    SOTA Open Source TTS

    Fish Speech is a state-of-the-art open-source text-to-speech project that has evolved into the OpenAudio series of advanced TTS models. The repository hosts the code and tooling for training, fine-tuning, and serving high-quality TTS, while the current flagship models (OpenAudio-S1 and S1-mini) are distributed via Fish Audio’s playground and Hugging Face.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 3
    Speech-AI-Forge

    Speech-AI-Forge

    Speech-AI-Forge is a project developed around TTS generation model

    Speech-AI-Forge is a full-stack project built around modern text-to-speech generation models, providing both an API server and a Gradio-based web UI for interactive use. At its core, it acts as a hub that wires together multiple speech-related capabilities, including TTS, speech-to-text and LLM-based control flows, behind a consistent interface.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Whisper

    Whisper

    Robust Speech Recognition via Large-Scale Weak Supervision

    OpenAI Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification. A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. ...
    Downloads: 69 This Week
    Last Update:
    See Project
  • Create and manage the email signature you need Icon
    Create and manage the email signature you need

    For companies and organizations that need an email signature solution

    With WiseStamp it’s easy to unify your brand and turn your emails into a powerful marketing tool. Get the most out of your emails with a professionally designed custom email signature.
    Learn More
  • 5
    MLX-Audio

    MLX-Audio

    A text-to-speech, speech-to-text and speech-to-speech library

    MLX-Audio is a speech library built on Apple’s MLX framework and optimized for Apple Silicon machines (M-series Macs). It focuses on text-to-speech and speech-to-speech workflows, with APIs and a command-line interface that make it easy to generate high-quality audio from text. Because it uses MLX and targets Apple Silicon, inference is fast and can take advantage of hardware acceleration and quantization for efficient on-device performance.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 6
    SpeechRecognition

    SpeechRecognition

    Speech recognition module for Python

    Library for performing speech recognition, with support for several engines and APIs, online and offline. Recognize speech input from the microphone, transcribe an audio file, save audio data to an audio file. Show extended recognition results, calibrate the recognizer energy threshold for ambient noise levels (see recognizer_instance.energy_threshold for details).
    Downloads: 9 This Week
    Last Update:
    See Project
  • 7
    VoxCPM2

    VoxCPM2

    Tokenizer-Free TTS for Multilingual Speech Generation

    VoxCPM2 is an advanced open-source text-to-speech system that redefines speech synthesis by eliminating traditional tokenization and instead generating continuous speech representations through a diffusion-based autoregressive architecture. Built on top of the MiniCPM model family, it enables highly natural, expressive, and context-aware speech generation that adapts tone, emotion, and pacing directly from input text.
    Downloads: 21 This Week
    Last Update:
    See Project
  • 8
    SenseVoice

    SenseVoice

    Multilingual speech recognition and audio understanding model

    SenseVoice is a speech foundation model designed to perform multiple voice understanding tasks from audio input. It provides capabilities such as automatic speech recognition, spoken language identification, speech emotion recognition, and audio event detection within a single system. SenseVoice is trained on more than 400,000 hours of speech data and supports over 50 languages for multilingual recognition tasks.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 9
    VoxCPM

    VoxCPM

    TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

    VoxCPM is a tokenizer-free text-to-speech system that models speech in a continuous space, aiming for extremely realistic, context-aware synthesis and true-to-life zero-shot voice cloning. Instead of converting speech into discrete tokens, it uses an end-to-end diffusion-autoregressive architecture built on the MiniCPM-4 backbone, combining hierarchical language modeling, finite scalar quantization (FSQ), and local Diffusion Transformers.
    Downloads: 58 This Week
    Last Update:
    See Project
  • Managed Cybersecurity Platform Built for MSPs Icon
    Managed Cybersecurity Platform Built for MSPs

    Discover the cyber platform that secures and insures SMEs

    In a world that lives and breathes all things digital, every business is at risk. Cybersecurity has become a major problem for small and growing businesses due to limited budgets, resources, time, and training. Hackers are leveraging these vulnerabilities, and most of the existing cybersecurity solutions on the market are too cumbersome, too complicated, and far too costly.
    Learn More
  • 10
    Step-Audio

    Step-Audio

    Open-source framework for intelligent speech interaction

    Step-Audio is a unified, open-source framework aimed at building intelligent speech systems that combine both comprehension and generation: it integrates large language models (LLMs) with speech input/output to handle not only semantic understanding but also rich vocal characteristics like tone, style, dialect, emotion, and prosody. The design moves beyond traditional separate-component pipelines (ASR → text model → TTS), instead offering a multimodal model that ingests speech or audio and produces speech accordingly, enabling natural dialogue, voice cloning, and expressive speech synthesis. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    Voice-Pro

    Voice-Pro

    Comprehensive Gradio WebUI for audio processing

    Voice-Pro is the best gradio WebUI for transcription, translation and text-to-speech. It can be easily installed with one click. Create a virtual environment using Miniconda, running completely separate from the Windows system (fully portable). Supports real-time transcription and translation, as well as batch mode.
    Downloads: 36 This Week
    Last Update:
    See Project
  • 12
    abogen

    abogen

    Generate audiobooks from EPUBs, PDFs and text with captions

    abogen is a tool designed to generate audiobooks (or speech narrations) from textual sources such as EPUBs, PDFs, or plain text, with synchronized captions. In other words, it automates the pipeline of reading a digital book (or document), converting its text into speech via a TTS engine, and packaging the result into an audiobook format — likely along with timestamped captions or subtitles that align with the spoken audio.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 13
    ChatTTS

    ChatTTS

    A generative speech model for daily dialogue

    ChatTTS is an open-source conversational text-to-speech model optimized for dialogue, developed by 2Noise. Trained on 100,000+ hours of English and Chinese conversation data, it excels at generating expressive prosody—pauses, interjections, laughter—for more natural-sounding speech synthesis in assistant and chatbot applications.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 14
    pyVideoTrans

    pyVideoTrans

    Translate the video from one language to another and embed dubbing

    pyVideoTrans is an ambitious open-source multimedia processing project that assembles speech recognition, subtitle generation, AI translation, voice synthesis, and video assembly into a unified pipeline for converting videos from one language to another with embedded dubbing and captions. At its core it runs speech-to-text models to transcribe audio tracks, translates the resulting text into a target language using local or cloud-based translation engines, synthesizes new speech to match the translated subtitles, and then merges that speech back into the video, creating a fully localized media file. ...
    Downloads: 22 This Week
    Last Update:
    See Project
  • 15
    StreamSpeech

    StreamSpeech

    StreamSpeech is a seamless model for offline speech recognition

    StreamSpeech is an “all-in-one” speech model designed to perform offline and simultaneous speech recognition, speech translation, and speech synthesis within a single unified architecture. Developed as part of an ACL 2024 paper, it targets streaming and low-latency scenarios where intermediate results and final translations or synthetic speech must be produced continuously as audio is being received.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    RealtimeSTT

    RealtimeSTT

    A robust, efficient, low-latency speech-to-text library

    RealtimeSTT is a Python-based realtime speech-to-text engine emphasizing low latency, wake-word detection, voice activity detection, and automatic speech segmentation. It provides asynchronous callbacks, nanosecond-precision timestamps, and CLI tools, suitable for building voice assistants, meeting transcribers, or live caption systems.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 17
    edge-tts

    edge-tts

    Use Microsoft Edge's online text-to-speech service from Python

    edge-tts is a Python module and command-line tool that gives you direct access to Microsoft Edge’s online text-to-speech service without needing the Edge browser, Windows, or any API key. It wraps the same cloud voices used by Edge, exposing them through a simple CLI (edge-tts, edge-playback) and a Python API, so you can script high-quality speech generation in your own applications. The tool lets you list available voices, specify locale and voice name, and generate audio files in common formats like MP3 or WAV. ...
    Downloads: 23 This Week
    Last Update:
    See Project
  • 18
    TADA

    TADA

    Open Source Speech Language Model

    ...This approach can support applications such as conversational AI, speech synthesis, multimodal language modeling, and speech understanding systems. The project explores ways to treat speech and text as integrated data streams rather than separate pipelines, enabling more coherent interactions between language and audio. Because it operates as a generative framework, TADA can be used for research into advanced speech-language models and multimodal artificial intelligence systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    WhisperX

    WhisperX

    Automatic Speech Recognition with Word-level Timestamps

    ...Its architecture combines multiple components to enhance both performance and usability in real-world transcription tasks. Overall, whisperx provides a more robust and scalable solution for high-quality speech-to-text applications.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 20
    Applio

    Applio

    A simple, high-quality voice conversion tool focused on ease of use

    ...It focuses strongly on ease of use: installation scripts for Windows, Linux, and macOS set up dependencies and then launch a browser-based Gradio interface. Within that interface, users can train and run voice conversion models for tasks like singing conversion, speech-to-speech transformation, and voice cloning. The project is structured to be flexible through plugins and configurations so users can extend functionality without touching the core code. Applio is considered stable and mature; ongoing development is now centered on security patches, dependency maintenance, and occasional improvements, which makes it attractive for production or repeatable workflows. ...
    Downloads: 101 This Week
    Last Update:
    See Project
  • 21
    kokoro-onnx

    kokoro-onnx

    TTS with kokoro and onnx runtime

    kokoro-onnx is a text-to-speech toolkit that wraps the Kokoro neural TTS model in an easy-to-use ONNX Runtime interface, so you can generate speech from Python with minimal setup. It focuses on running efficiently on commodity hardware, including macOS with Apple Silicon, while still delivering near real-time performance for many use cases. The project ships prebuilt model files and a simple example script, so you can go from installation to producing an audio.wav file in just a few steps. ...
    Downloads: 142 This Week
    Last Update:
    See Project
  • 22
    PersonaPlex

    PersonaPlex

    PersonaPlex code

    PersonaPlex is an open-source real-time conversational speech AI model that goes beyond traditional text chat by providing full-duplex speech-to-speech interaction, meaning it can listen and talk at the same time instead of waiting for you to finish speaking before responding. This architectural approach eliminates awkward pauses and makes conversations feel much more human-like, with natural behaviors such as overlapping speech, interruptions, and fluent turn-taking, traits that traditional AI assistants typically lack. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Spark TTS

    Spark TTS

    Spark-TTS Inference Code

    Spark TTS is an open-source, PyTorch-based text-to-speech inference system that leverages large language models to produce highly natural, intelligible speech from text input. It uses an efficient single-stream architecture where speech tokens are directly reconstructed from the predictions of an LLM, removing the need for external acoustic models or complex vocoders and making the generation pipeline cleaner and faster.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    pyttsx3

    pyttsx3

    Offline Text To Speech synthesis for python

    pyttsx3 is an offline text-to-speech library for Python that wraps native speech engines instead of calling cloud APIs. It is designed to work entirely without an internet connection, making it suitable for local automation, kiosks, accessibility tools, and embedded applications. On Windows it uses SAPI5, on Linux it typically uses eSpeak or eSpeak-NG, and on macOS it can use NSSpeechSynthesizer or AVSpeechSynthesizer, giving it broad cross-platform compatibility.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 25
    IndexTTS2

    IndexTTS2

    Industrial-level controllable zero-shot text-to-speech system

    IndexTTS is a modern, zero-shot text-to-speech (TTS) system engineered to deliver high-quality, natural-sounding speech synthesis with few requirements and strong voice-cloning capabilities. It builds on state-of-the-art models such as XTTS and other modern neural TTS backbones, improving them with a conformer-based speech conditional encoder and upgrading the decoder to a high-quality vocoder (BigVGAN2), leading to clearer and more natural audio output.
    Downloads: 7 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB