Audio & Voice AI Tools
42 tools
Murf AI
Studio-quality AI voiceovers in minutes, no mic needed
Murf AI is a text-to-speech platform offering over 120 natural-sounding AI voices across 20+ languages. Users can create professional voiceovers for videos, podcasts, e-learning, and presentations by simply typing their script. It includes a built-in studio with voice customization, pitch control, and media sync features, making it ideal for content creators and businesses.
Transgate
AI-powered translation and localization for global teams
Transgate is an AI translation tool that helps businesses and developers localize content across multiple languages quickly. It supports document and software string translation with context-aware AI, streamlining localization workflows for teams shipping global products.
Whisper API
Accurate speech-to-text transcription via a simple REST API
Whisper API provides a hosted, developer-friendly REST API built on OpenAI's Whisper speech recognition model. It supports transcription and translation across dozens of languages with high accuracy, making it straightforward for developers to add voice-to-text capabilities to their applications.
Summara
Summarize podcasts and audio content with AI automatically
Summara is an AI tool that transcribes and summarizes podcasts and audio recordings, delivering concise written overviews of long-form audio content. It helps busy professionals and podcast enthusiasts extract key takeaways without listening to full episodes.
Perch Reader
Read smarter with AI highlights and insights on any article
Perch Reader is an AI-enhanced reading app that overlays highlights, summaries, and contextual insights on articles and web pages as you read. It helps knowledge workers and researchers absorb content faster by surfacing the most important information and related context in-line.
Otter.ai
Automatic meeting transcription, notes, and summaries powered by AI
Otter.ai is an AI meeting assistant that provides real-time transcription, automated meeting summaries, and action item extraction for Zoom, Google Meet, and Microsoft Teams calls. It integrates directly with calendar and conferencing tools to join meetings automatically and deliver shareable notes. Teams and professionals use it to capture every detail from meetings without manual note-taking, improving follow-through and accessibility.
Sybill
AI sales assistant that analyzes calls and writes CRM notes
Sybill is an AI tool for sales teams that records and analyzes sales calls, automatically writing CRM notes and follow-up emails while surfacing buyer intent signals. It reads verbal and behavioral cues to give reps insight into prospect engagement and deal health after every conversation.
Loopin AI
Automate meeting recaps and keep your team aligned effortlessly
Loopin AI is a meeting productivity tool that automatically records, transcribes, and summarizes meetings, then distributes recaps and tracks action items across your team. It integrates with calendars and project management tools to close the loop between meetings and actual work.
Descript Overdub
Clone your voice and fix audio recording mistakes by editing text
Descript Overdub lets users create a realistic AI voice clone of themselves to correct or add words in recorded audio simply by editing the transcript. It integrates natively within the Descript editing platform and supports ultra-realistic speech synthesis. Perfect for podcasters, video editors, and content creators who need seamless audio corrections.
Respeecher
Clone and transform voices with professional-grade AI fidelity
Respeecher is a voice cloning platform used in professional film, TV, and game production to convert one speaker's voice into a target voice with high accuracy. It preserves emotion, tone, and nuance, and has been used in major Hollywood productions. Designed for studios, post-production teams, and voice actors needing authentic voice transformation.
ElevenLabs
Generate ultra-realistic AI voices in any language or accent
ElevenLabs offers industry-leading AI text-to-speech and voice cloning with remarkably natural-sounding output across 29+ languages. Users can clone any voice from a short audio sample or choose from a diverse voice library. Used by publishers, game developers, and content creators for narration, dubbing, and interactive applications.
Resemble AI
Build and deploy custom AI voices for any application
Resemble AI provides real-time voice cloning, text-to-speech, and speech-to-speech AI voice generation with enterprise-grade APIs. It offers emotion and pitch control, localization support, and a watermarking feature for ethical AI voice use. Targeted at developers, media companies, and enterprises integrating voice AI into products and workflows.
iSpeech
Convert text to natural AI speech and speech to text via API
iSpeech is a cloud-based text-to-speech and speech recognition platform offering developer APIs for embedding TTS and STT capabilities into apps and websites. It supports multiple voices, languages, and output formats including MP3 and OGG. Suited for developers, e-learning platforms, and businesses needing accessible voice output.
Veritone Voice
License and deploy synthetic AI voices at enterprise scale
Veritone Voice is an enterprise AI voice platform that enables organizations to create, license, and deploy synthetic voice replicas for media, broadcasting, and customer experience. It integrates with Veritone's broader aiWARE platform and supports talent-licensed voice cloning. Designed for media companies, broadcasters, and large enterprises.
Microsoft Azure Neural TTS
Deploy lifelike neural text-to-speech at cloud scale via Azure
Azure Neural TTS (now Azure AI Speech) delivers highly natural text-to-speech using neural network models with 400+ voices across 140+ languages. It supports SSML customization, custom neural voice creation, and real-time or batch synthesis. Built for developers and enterprises integrating speech output into applications, bots, and accessibility tools.
Zenmic.com
Enhance microphone audio quality with real-time AI noise removal
Zenmic is an AI-powered audio enhancement tool that removes background noise, echo, and distortion from microphone input in real time. It works as a virtual audio device compatible with most conferencing and recording apps. Suited for remote workers, podcasters, and streamers who need clean professional-sounding audio without expensive hardware.
Audify AI
Convert articles and text content into listenable audio instantly
Audify AI transforms written content such as articles, blog posts, and documents into natural-sounding audio files using AI text-to-speech. It enables publishers and content creators to offer audio versions of their content quickly. Designed for bloggers, newsletters, and media sites looking to improve accessibility and audience engagement.
Splash Pro
Make original AI-generated music tracks without any music skills
Splash Pro is an AI music creation platform that lets users generate royalty-free original songs from simple prompts, genre selections, and lyrics. It uses deep learning to compose, mix, and produce full tracks in seconds. Ideal for content creators, game developers, and marketers who need custom background music without hiring composers.
AIVA
Compose original AI-generated music for film, games, and media
AIVA (Artificial Intelligence Virtual Artist) is an AI music composer that generates original scored music in hundreds of styles for film, advertising, games, and media. Users can influence compositions via mood, instrumentation, and tempo settings, and receive full commercial licensing. Trusted by composers, game studios, and content creators worldwide.
Mubert
Stream and generate royalty-free AI music for any use case
Mubert is a generative AI music platform that creates continuous, royalty-free soundtracks tailored to mood, genre, tempo, and duration in real time. It offers API access for developers and a creator platform for influencers and brands. Used widely by streamers, app developers, and marketers needing dynamic, on-demand background music.
Soundraw
Customize and generate royalty-free AI music tracks on demand
Soundraw is an AI music generator that allows users to create and fine-tune royalty-free tracks by selecting mood, genre, tempo, and length, with adjustable instrument layers. Once generated, tracks can be exported and used commercially. Designed for video creators, YouTubers, and filmmakers who need unique background music quickly.
Beatoven.ai
Generate mood-based royalty-free AI music for videos and podcasts
Beatoven.ai creates original royalty-free music by letting users select scenes, moods, and genres, then generates adaptive tracks that fit the emotional arc of their content. It supports timeline-based mood changes within a single track. Built for video editors, podcasters, and content creators who need expressive, context-aware background music.
Boomy
Create and release original AI music to streaming platforms instantly
Boomy lets anyone generate original AI songs in seconds across multiple genres, then distribute them directly to Spotify, Apple Music, and other streaming platforms. Users earn royalties from streams with minimal music knowledge required. Aimed at aspiring musicians, hobbyists, and creators who want to participate in the music economy.
Loudly
Compose AI-powered royalty-free music tracks with genre controls
Loudly is an AI music platform offering a large library of royalty-free tracks alongside generative AI tools for creating custom music by adjusting genre, energy, and instrumentation. It provides stem-level editing for further customization. Suited for music producers, content creators, and brands needing flexible, licensable music assets.
Soundful
Generate high-quality royalty-free AI music at the click of a button
Soundful uses AI to generate unique, royalty-free background music tracks across a wide range of genres and moods with a single click. Each track is distinct and export-ready for use in videos, streams, and podcasts. Targeted at content creators, streamers, and podcasters who need consistent, professional-sounding music without licensing concerns.
AI Music Generator
Generate custom AI songs from text prompts for free online
AI Music Generator (aisongmaker.io) is a free online tool that creates original songs from text descriptions, selecting appropriate genre, tempo, and instrumentation automatically. It requires no sign-up for basic use and produces downloadable audio files. Designed for casual creators, students, and anyone experimenting with AI-generated music.
AI Voice Agents
Deploy smart AI voice agents for inbound and outbound calling
Diallink's AI Voice Agents platform enables businesses to create and deploy conversational AI agents that handle inbound and outbound phone calls for customer support, appointment booking, and lead qualification. Calls are conducted in natural language with real-time transcription and CRM integration. Built for SMBs and enterprises automating voice-based customer interactions.
Play.ht
Convert any text to ultra-realistic AI voices for audio content
Play.ht is an AI text-to-speech platform offering 900+ voices in 142+ languages with ultra-realistic neural voice output and instant voice cloning from short audio samples. It provides a podcast hosting feature, WordPress plugin, and API access. Widely used by podcasters, publishers, and developers building voice-enabled applications.
Coqui
Open-source AI voice cloning and text-to-speech for developers
Coqui is an open-source AI platform for text-to-speech and voice cloning built on deep learning models including XTTS, enabling developers to create natural-sounding voices with minimal data. It supported both a hosted product and open-source libraries, widely adopted by the developer and research community before its commercial pivot. Best suited for engineers and researchers building custom voice AI.
VALL-E X
Cross-lingual voice cloning that preserves speaker identity across languages
VALL-E X is a research model from Microsoft that extends the VALL-E language model to perform cross-lingual speech synthesis and voice cloning, reproducing a speaker's voice in a different language from just a short audio prompt. It maintains speaker emotion and acoustic environment across languages. Aimed at researchers and developers advancing multilingual speech AI.
CustomPod.io
Create and sell AI-powered custom podcast content at scale
CustomPod.io lets creators generate personalized podcast episodes using AI voices and scripts. It streamlines audio content production for brands, educators, and creators who want to launch or scale a podcast without extensive recording infrastructure.
Harmonai
Open-source AI music generation tools for everyone
Harmonai is a community-driven research organization building open-source generative audio tools. Its flagship project Dance Diffusion lets musicians and developers create original music using diffusion-based AI models, making cutting-edge music generation freely accessible.
MusicLM
Generate high-fidelity music from text descriptions with Google AI
MusicLM is Google Research's text-to-music model that produces high-quality, coherent music clips from natural language prompts. It can follow detailed genre, mood, and instrument descriptions, representing a major leap in AI-driven music synthesis for researchers and enthusiasts.
Remusic
Compose original AI music tracks in seconds from a prompt
Remusic is an AI music generation platform that creates royalty-free songs from text or mood inputs. Users can customize genre, tempo, and instrumentation, making it ideal for content creators, game developers, and marketers who need original background music fast.
AI Wedding Toast
Write a heartfelt, personalized wedding toast with AI in minutes
AI Wedding Toast guides users through a simple questionnaire about the couple and their relationship, then generates a warm, humorous, or heartfelt speech tailored to the occasion. It is perfect for best men, maids of honor, and family members who want to deliver a memorable toast without the stress of writing from scratch.
Fireflies.ai
AI notetaker that records, transcribes, and analyzes your meetings
Fireflies.ai is an AI meeting assistant that automatically joins video calls to record, transcribe, and produce searchable summaries with speaker identification and sentiment analysis. It integrates with over 40 tools including Salesforce, HubSpot, Slack, and Notion to push meeting insights directly into existing workflows. Sales teams, recruiters, and remote teams use it to reduce manual documentation and surface key decisions and action items from every conversation.
Whisper
OpenAI's open-source speech recognition model for any audio
Whisper is an open-source automatic speech recognition (ASR) system from OpenAI trained on 680,000 hours of multilingual audio data. It performs robust transcription and translation across 99 languages with strong accuracy even in noisy conditions or with accented speech. Developers and researchers use it as a foundation for transcription apps, voice assistants, subtitle generation, and audio data processing pipelines.
Wispr Flow
Dictate naturally anywhere on your Mac with AI-powered transcription
Wispr Flow is a macOS dictation tool powered by AI that lets users speak naturally and converts speech to polished, context-aware text across any application. Unlike standard dictation, it cleans up filler words, formats output appropriately for the target app, and learns personal writing style over time. It is designed for professionals who want to write faster by speaking, without switching away from their current workflow.
AudioCraft
Meta's open-source AI framework for music and audio generation
AudioCraft is an open-source AI research framework from Meta that includes MusicGen, AudioGen, and EnCodec models for generating high-quality music and sound effects from text descriptions. MusicGen can produce full instrumental tracks in various styles, while AudioGen focuses on environmental and ambient sounds. It is targeted at researchers, audio engineers, and developers who want to build or experiment with generative audio applications.
Stable Audio
Generate high-quality music and sound effects from text prompts
Stable Audio from Stability AI is a latent diffusion model that generates music tracks and sound effects from text prompts, with control over duration and style. It supports generation of full-length stereo audio at 44.1kHz, making it suitable for professional-quality output. Musicians, sound designers, and content creators use it to quickly prototype musical ideas or generate royalty-free audio assets.
Suno AI
Create full songs with vocals and instruments from a text prompt
Suno AI is a generative music platform that creates complete, radio-ready songs—including vocals, lyrics, and instrumentation—from simple text descriptions. Users can specify genre, mood, and style to produce surprisingly polished tracks within seconds. It is widely used by hobbyists, content creators, and musicians looking for rapid song prototyping or royalty-free background music without music production expertise.
Udio
Generate studio-quality songs and music across any genre instantly
Udio is an AI music generation platform that produces high-fidelity songs with realistic vocals, layered instruments, and nuanced musical arrangements from text prompts. It allows fine-grained control through custom lyrics, genre tags, and reference inputs to guide the output style. The platform appeals to musicians, producers, and content creators who want to explore generative music with a high degree of sonic quality.