Text-to-Speech
Convert text to natural-sounding speech using Mirako's advanced text-to-speech technology. Generate audio from text using premade voices or your custom voice profiles.
To generate speech, you can choose from a variety of premade voices or create your own custom voice profiles. Premade voices are usually multilingual, supporting languages like English, Cantonese, and Mandarin, hence it's a good starting point for most applications.
If you need a specific voice or style that doesn't come with the premade options, Mirako support voice cloning that allows you to create a custom voice profile based on your own recordings.
Supported Languages
Below are the primary languages supported by Mirako's TTS service. We will continue to expand our language offerings over time.
- English
- Cantonese (yue)
- Mandarin Chinese (mandarin)
Note: If you have specific language requirements, welcome to reach out to our Discord channel.
Quick Start
You can start generating speech using the Mirako CLI tool.
mirako speech tts --text "Hello, this is a sample text-to-speech conversion." --voice <MY_VOICE_ID> -o output.wav
This command will generate speech from the provided text using the specified voice profile.
Dealing with Chinese Language
We support the two major dialects of Chinese: Cantonese and Mandarin. When generating speech in Chinese, you need to specify the dialect using the --chinese-language
option:
- Use
--chinese yue
for Cantonese (default if you supply it), or - Use
--chinese mandarin
for Mandarin.
This is essential telling Mirako to interpret the text as the desired dialect output, since many of the characters are shared in Chinese languages. The flag is only required when you are generating speech with any Chinese characters.
Parameters
You can customize the speech output using various parameters:
temperature
: Controls the randomness of the speech. Lower values (e.g., 0.5) produce more consistent speech, while higher values (e.g., 1.0) allow for more variation. Default is 1.0.fragment_interval
: Controls how long the pause between sentences or phrases. Ranging from 0 - 1.0. This helps in making the speech sound more natural. Default is 0.1.
Using the TTS API
Alternatively, you can use the Mirako API to integrate TTS into your application. Below is an example of how to use the TTS API with advanced options like temperature and fragment intervals for more control over the speech output.
import requests
# API configuration
API_KEY = "your_api_key_here"
BASE_URL = "https://mirako.co"
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
def tts(text, voice_profile_id, chinese_language=None, temperature=1.0, fragment_interval=0.1):
"""Generate speech with advanced options. Use chinese_language=\"yue\" for Cantonese, \"mandarin\" for Mandarin."""
payload = {
"text": text,
"voice_profile_id": voice_profile_id,
"return_type": "file_url",
"opts": {
"temperature": temperature,
"fragment_interval": fragment_interval
}
}
if chinese_language:
payload["chinese_language"] = chinese_language
response = requests.post(f"{BASE_URL}/v1/speech/tts", headers=headers, json=payload)
if response.status_code == 200:
return response.json()['data']
else:
print(f"Error: {response.text}")
return None
# Generate English speech with custom parameters
result = tts(
"This is a sample of English text-to-speech with custom parameters.",
"your_english_voice_id",
temperature=0.8, # Lower for more consistent speech
fragment_interval=0.2 # Longer pauses between sentences
)
# Generate Cantonese speech
cantonese_result = tts(
"你好,歡迎使用我地嘅語音生成服務!",
"your_cantonese_voice_id",
chinese_language="yue"
)
# Generate Mandarin speech
mandarin_result = tts(
"你好,欢迎使用我们的语音生成服务!",
"your_mandarin_voice_id",
chinese_language="mandarin"
)
Response
The response will include the generated audio file URL and metadata such as duration. You can specify the return_type
to get either a direct audio file URL or a base64-encoded audio string.