Text-to-Speech

Convert text to natural-sounding speech using Mirako's advanced text-to-speech technology. Generate audio from text using premade voices or your custom voice profiles.

To generate speech, you can choose from a variety of premade voices or create your own custom voice profiles. Premade voices are usually multilingual, supporting languages like English, Cantonese, and Mandarin, hence it's a good starting point for most applications.

If you need a specific voice or style that doesn't come with the premade options, Mirako support voice cloning that allows you to create a custom voice profile based on your own recordings.

Supported Languages

Below are the primary languages supported by Mirako's TTS service. We will continue to expand our language offerings over time.

English
Cantonese (yue)
Mandarin Chinese (mandarin)

Note: If you have specific language requirements, welcome to reach out to our Discord channel.

Quick Start

You can start generating speech using the Mirako CLI tool.

mirako speech tts --text "Hello, this is a sample text-to-speech conversion." --voice <MY_VOICE_ID> -o output.wav

This command will generate speech from the provided text using the specified voice profile.

Dealing with Chinese Language

We support the two major dialects of Chinese: Cantonese and Mandarin. When generating speech in Chinese, you need to specify the dialect using the --chinese-language option:

Use --chinese yue for Cantonese (default if you supply it), or
Use --chinese mandarin for Mandarin.

This is essential telling Mirako to interpret the text as the desired dialect output, since many of the characters are shared in Chinese languages. The flag is only required when you are generating speech with any Chinese characters.

Parameters

You can customize the speech output using various parameters:

temperature: Controls the randomness of the speech. Lower values (e.g., 0.5) produce more consistent speech, while higher values (e.g., 1.0) allow for more variation. Default is 1.0.
fragment_interval: Controls how long the pause between sentences or phrases. Ranging from 0 - 1.0. This helps in making the speech sound more natural. Default is 0.1.

Using the TTS API

Alternatively, you can use the Mirako API to integrate TTS into your application. Below is an example of how to use the TTS API with advanced options like temperature and fragment intervals for more control over the speech output.

python

import requests

# API configuration
API_KEY = "your_api_key_here"
BASE_URL = "https://mirako.co"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

def tts(text, voice_profile_id, chinese_language=None, temperature=1.0, fragment_interval=0.1):
    """Generate speech with advanced options. Use chinese_language=\"yue\" for Cantonese, \"mandarin\" for Mandarin."""
    payload = {
        "text": text,
        "voice_profile_id": voice_profile_id,
        "return_type": "file_url",
        "opts": {
            "temperature": temperature,
            "fragment_interval": fragment_interval
        }
    }

    if chinese_language:
        payload["chinese_language"] = chinese_language

    response = requests.post(f"{BASE_URL}/v1/speech/tts", headers=headers, json=payload)
    if response.status_code == 200:
        return response.json()['data']
    else:
        print(f"Error: {response.text}")
        return None
 


# Generate English speech with custom parameters
result = tts(
    "This is a sample of English text-to-speech with custom parameters.",
    "your_english_voice_id",
    temperature=0.8,        # Lower for more consistent speech
    fragment_interval=0.2    # Longer pauses between sentences
)

# Generate Cantonese speech
cantonese_result = tts(
    "你好，歡迎使用我地嘅語音生成服務！",
    "your_cantonese_voice_id",
    chinese_language="yue"
)
 
# Generate Mandarin speech
mandarin_result = tts(
    "你好，欢迎使用我们的语音生成服务！",
    "your_mandarin_voice_id",
    chinese_language="mandarin"
)

Response

The response will include the generated audio file URL and metadata such as duration. You can specify the return_type to get either a direct audio file URL or a base64-encoded audio string.