Avatar Motion Videos

Create realistic videos featuring full-body animated avatars with expressive gestures, dynamic scenes, and lip-synced speech. The Mirako API generates avatar motion videos where avatars perform natural movements and interact with their environment while speaking.

Avatar motion videos are perfect for:

Social media content and product avertising
Interactive storytelling and narratives
Virtual training and demonstrations
Dynamic marketing and advertising
Educational content with character interactions
Virtual events and presentations

Quick Start

To quick start, you can create avatar motion videos using the mirako cli tools:

mirako video generate --model motion\
  --image portrait.jpg \
  --audio speech.wav \
  --positive_prompt "Woman walking in a park, smiling and waving" \
  --negative_prompt "looking away, static pose"

image is the image used as the starting frame of the video.
audio specifies the audio file used as the speech to be spoken by the avatar.
positive_prompt describes the desired avatar motion and expressions.
negative_prompt describes what to avoid in the motion generation.

Integrate using REST API

You can integrate avatar motion video generation into your applications using our REST API.

Generating an avatar motion video is an async process that involves:

Start the generating task - Send a request with an image, audio, and motion prompts.
Poll for status - Check the status of the video generation task.

or, you can make use of webhooks to get notified when the video is ready.

To start generating an avatar motion video:

python

import requests
import base64
import time

# API configuration
API_KEY = "your_api_key_here"
BASE_URL = "https://mirako.co"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

def create_avatar_motion(image_path, audio_path, positive_prompt, negative_prompt):
    """Create avatar motion video from image, audio, and prompts"""

    # Encode image to base64
    with open(image_path, "rb") as image_file:
        image_data = base64.b64encode(image_file.read()).decode('utf-8')

    # Encode audio to base64
    with open(audio_path, "rb") as audio_file:
        audio_data = base64.b64encode(audio_file.read()).decode('utf-8')

    payload = {
        "image": image_data,
        "audio": audio_data,
        "positive_prompt": positive_prompt,
        "negative_prompt": negative_prompt
    }

    response = requests.post(
        f"{BASE_URL}/v1/video/async_generate_avatar_motion",
        headers=headers,
        json=payload
    )

    if response.status_code == 200:
        result = response.json()
        task_id = result['data']['task_id']
        print(f"✅ Avatar motion generation started!")
        print(f"Task ID: {task_id}")
        return task_id
    else:
        print(f"❌ Error: {response.status_code}")
        print(response.text)
        return None

# Generate avatar motion
task_id = create_avatar_motion(
    "portrait.jpg",
    "speech.wav",
    "A happy young man laughing and gesturing enthusiastically",
    "blurry, low quality, static pose"
)

Input Requirements

Image Requirements

Formats: JPG, PNG
Size: Minimum 512x512 pixels, maximum 1920x1080 pixels.
Quality: Clear frontal face, good lighting
Face: Single person, looking forward, with mouth closed
Background: Simple background preferred

Audio Requirements

Formats: WAV, MP3
Duration: Up to 3 minutes per video
Quality: Clear speech, minimal background noise
Sample Rate: 44.1kHz or 48kHz recommended.

Prompt Requirements

Positive Prompt: Describe the desired avatar motion and expressions (max 512 characters)
Negative Prompt: Describe what to avoid in the motion generation (max 512 characters)
Examples:
- Positive: "A confident speaker gesturing with hands while explaining concepts"
- Negative: "static pose, no movement, blurry"

Polling for Video Status

python

def check_video_status(task_id):
    """Check avatar motion generation status"""

    response = requests.get(
        f"{BASE_URL}/v1/video/async_generate_avatar_motion/{task_id}/status",
        headers=headers
    )

    if response.status_code == 200:
        result = response.json()['data']
        status = result['status']

        print(f"Status: {status}")

        if status == "COMPLETED":
            video_url = result.get('file_url')
            duration = result.get('output_duration')
            print(f"🎉 Video generation completed!")
            print(f"Video URL: {video_url}")
            print(f"Duration: {duration} seconds")
            return {"status": "completed", "video_url": video_url, "duration": duration}

        elif status in ["IN_QUEUE", "IN_PROGRESS"]:
            print("⏳ Video generation in progress...")
            return {"status": "processing"}

        elif status in ["FAILED", "CANCELED", "TIMED_OUT"]:
            print(f"❌ Video generation failed: {status}")
            return {"status": "failed"}

        else:
            print(f"Unknown status: {status}")
            return {"status": "unknown"}
    else:
        print(f"Error checking status: {response.text}")
        return {"status": "error"}

def wait_for_video_completion(task_id, max_wait_time=300):  # 5 minutes
    """Wait for video generation to complete"""

    start_time = time.time()

    while time.time() - start_time < max_wait_time:
        result = check_video_status(task_id)

        if result["status"] == "completed":
            return result
        elif result["status"] == "failed":
            return None

        # Wait 10 seconds before next check
        time.sleep(10)

    print("⏰ Timeout: Video didn't complete within time limit")
    return None

# Wait for completion
if task_id:
    video_result = wait_for_video_completion(task_id)

    if video_result:
        print(f"✅ Video ready: {video_result['video_url']}")
    else:
        print("❌ Video generation failed or timed out")

Webhook Support

Using webhooks for callback is useful when you have a server-less node, which a long-running polling process is not ideal.

python

def create_avatar_motion_with_webhook(image_path, audio_path, positive_prompt, negative_prompt, webhook_url, webhook_auth_token=None):
    """Create avatar motion with webhook notification"""

    # Encode files
    with open(image_path, "rb") as f:
        image_data = base64.b64encode(f.read()).decode('utf-8')

    with open(audio_path, "rb") as f:
        audio_data = base64.b64encode(f.read()).decode('utf-8')

    payload = {
        "image": image_data,
        "audio": audio_data,
        "positive_prompt": positive_prompt,
        "negative_prompt": negative_prompt,
        "webhook": {
            "url": webhook_url,
            "auth_token": webhook_auth_token  # Optional
        }
    }

    response = requests.post(
        f"{BASE_URL}/v1/video/async_generate_avatar_motion",
        headers=headers,
        json=payload
    )

    if response.status_code == 200:
        task_id = response.json()['data']['task_id']
        print(f"✅ Video generation started with webhook notification")
        print(f"Task ID: {task_id}")
        return task_id
    else:
        print(f"❌ Error: {response.text}")
        return None

# Use webhook for notifications
task_id = create_avatar_motion_with_webhook(
    "portrait.jpg",
    "speech.wav",
    "A happy young man laughing and gesturing enthusiastically",
    "blurry, low quality, static pose",
    "https://your-app.com/webhook/video-complete",
    "your_webhook_auth_token"
)

Response Video Format

The generated video will be in MP4 format with H.264 encoding @25fps, with the same dimensions as the input image.

Avatar Motion Videos

Quick Start

Integrate using REST API

Input Requirements

Image Requirements

Audio Requirements

Prompt Requirements

Polling for Video Status

Webhook Support

Response Video Format

Dive Deeper