LongCat-AudioDiT — Voice Cloning Studio

State-of-the-art voice cloning based on LongCat-AudioDiT by the Meituan LongCat Team. Give it a reference audio, type your text, get the result.

Research & Testing Only. This tool is provided strictly for research, educational, and personal experimentation purposes. It is not intended for generating deceptive, misleading, or harmful content. Do not use it to impersonate real individuals without their explicit consent, to create non-consensual deepfakes, or for any activity that violates applicable laws or regulations. By using this tool you accept full responsibility for ensuring your use complies with all relevant laws in your jurisdiction.

Memory Mode

Device

Model Status

Reference Voice

Saved Voices

Reference Audio (upload or record)

Whisper Model for Auto-Transcribe

Language (auto=detect)

Reference Transcription (auto-filled or type manually)

Save this voice to library

Voice Name

Library

Text to Synthesise

Text

AudioDiT Model

Guidance

Output

Status

Endpoint	Description
`POST /api/clone_voice`	Clone a voice: text + reference audio + transcription → audio
`POST /api/transcribe_reference`	Transcribe reference audio with Whisper
`POST /api/plain_tts`	Generate speech without a reference voice
`POST /api/transcribe`	Transcribe any audio file
`POST /api/save_voice`	Save a voice to the library
`POST /api/load_voice`	Load a voice from the library by name
`POST /api/delete_voice`	Delete a voice from the library
`POST /api/list_voices`	List all saved voices

Model	VRAM	Notes
AudioDiT-1B	~4 GB	Fast, great quality
AudioDiT-3.5B	~10 GB	SOTA quality
Whisper Turbo	~1.6 GB	Fast transcription
Whisper large-v3	~3 GB	Most accurate

LongCat-AudioDiT — Voice Cloning Studio

Reference Voice

Text to Synthesise

LongCat-AudioDiT Enhanced

API Endpoints (Gradio REST API)

Models

Voice Library