OpenClaw skill

avatar-video-messages

An OpenClaw skill that generates personalized avatar video messages from provided text. The avatar speaks the message using a specified voice and style. It outputs a URL to the generated video file.

View repository Open SKILL.md

Files

Review the files below to add this skill to your agents.

Security notice: review the SKILL.md file and repository content first before using any third-party skill.

_meta.jsonView
SKILL.mdView

SKILL.md content

---
name: video-message
description: Generate and send video messages with a lip-syncing VRM avatar. Use when user asks for video message, avatar video, video reply, or when TTS should be delivered as video instead of audio.
metadata:
  {
    "openclaw":
      {
        "emoji": "🎥",
        "requires": { "bins": ["ffmpeg", "avatarcam"] },
        "install":
          [
            {
              "id": "npm",
              "kind": "npm",
              "package": "@thewulf7/openclaw-avatarcam",
              "global": true,
              "bins": ["avatarcam"],
              "label": "Install avatarcam (npm)",
            },
            {
              "id": "brew",
              "kind": "brew",
              "formula": "ffmpeg",
              "bins": ["ffmpeg"],
              "label": "Install ffmpeg (brew)",
            },
            {
              "id": "apt",
              "kind": "apt",
              "packages": ["xvfb", "xauth"],
              "label": "Install headless X dependencies (Linux only)",
            },
          ],
      },
  }
---

# Video Message

Generate avatar video messages from text or audio. Outputs as Telegram video notes (circular format).

## Installation

```bash
npm install -g openclaw-avatarcam
```

## Configuration

Configure in `TOOLS.md`:

```markdown
### Video Message (avatarcam)
- avatar: default.vrm
- background: #00FF00
```

### Settings Reference

| Setting | Default | Description |
|---------|---------|-------------|
| `avatar` | `default.vrm` | VRM avatar file path |
| `background` | `#00FF00` | Color (hex) or image path |

## Prerequisites

### System Dependencies

| Platform | Command |
|----------|---------|
| **macOS** | `brew install ffmpeg` |
| **Linux** | `sudo apt-get install -y xvfb xauth ffmpeg` |
| **Windows** | Install ffmpeg and add to PATH |
| **Docker** | See Docker section below |

> **Note:** macOS and Windows don't need xvfb — they have native display support.

### Docker Users
Add to `OPENCLAW_DOCKER_APT_PACKAGES`:
```
build-essential procps curl file git ca-certificates xvfb xauth libgbm1 libxss1 libatk1.0-0 libatk-bridge2.0-0 libgdk-pixbuf2.0-0 libgtk-3-0 libasound2 libnss3 ffmpeg
```

## Usage

```bash
# With color background
avatarcam --audio voice.mp3 --output video.mp4 --background "#00FF00"

# With image background
avatarcam --audio voice.mp3 --output video.mp4 --background "./bg.png"

# With custom avatar
avatarcam --audio voice.mp3 --output video.mp4 --avatar "./custom.vrm"
```

## Sending as Video Note

Use OpenClaw's `message` tool with `asVideoNote`:

```
message action=send filePath=/tmp/video.mp4 asVideoNote=true
```

## Workflow

1. **Read config** from TOOLS.md (avatar, background)
2. **Generate TTS** if given text: `tts text="..."` → audio path
3. **Run avatarcam** with audio + settings → MP4 output
4. **Send as video note** via `message action=send filePath=... asVideoNote=true`
5. **Return NO_REPLY** after sending

## Example Flow

User: "Send me a video message saying hello"

```bash
# 1. TTS
tts text="Hello! How are you today?" → /tmp/voice.mp3

# 2. Generate video
avatarcam --audio /tmp/voice.mp3 --output /tmp/video.mp4 --background "#00FF00"

# 3. Send as video note
message action=send filePath=/tmp/video.mp4 asVideoNote=true

# 4. Reply
NO_REPLY
```

## Technical Details

| Setting | Value |
|---------|-------|
| Resolution | 384x384 (square) |
| Frame rate | 30fps constant |
| Max duration | 60 seconds |
| Video codec | H.264 (libx264) |
| Audio codec | AAC |
| Quality | CRF 18 (high quality) |
| Container | MP4 |

### Processing Pipeline
1. Electron renders VRM avatar with lip sync at 1280x720
2. WebM captured via `canvas.captureStream(30)`
3. FFmpeg processes: crop → fps normalize → scale → encode
4. Message tool sends via Telegram `sendVideoNote` API

## Platform Support

| Platform | Display | Notes |
|----------|---------|-------|
| macOS | Native Quartz | No extra deps |
| Linux | xvfb (headless) | `apt install xvfb` |
| Windows | Native | No extra deps |

## Headless Rendering

Avatarcam auto-detects headless environments:
- Uses `xvfb-run` when `$DISPLAY` is not set (Linux only)
- macOS/Windows use native display
- GPU stall warnings are safe to ignore
- Generation time: ~1.5x realtime (20s audio ≈ 30s processing)

## Notes

- Config is read from TOOLS.md
- Clean up temp files after sending: `rm /tmp/video*.mp4`
- For regular video (not circular), omit `asVideoNote=true`

How this skill works

The skill requires an ELEVENLABS_API_KEY in configuration
The skill requires an avatar image specified via avatar_image_path or avatar_image_url
The skill accepts a required 'text' input parameter for the message
The skill generates audio using ElevenLabs text-to-speech
The skill uses SadTalker to generate a video from the audio and avatar image
The skill returns a video_url output

When to use it

When generating lip-synced video messages from text using a customizable avatar
When responding to users with expressive avatar-based video communications

Best practices

Set required environment variables: OPENAI_API_KEY, AVATAR_PROVIDER, and provider-specific keys like SYNTHESIA_API_KEY
Install FFmpeg system-wide and ensure it's in PATH
Specify a valid AVATAR_ID for the selected provider
Test with short messages first to verify configuration and API access
Monitor API usage and costs as video generation consumes credits
Handle long generation times (up to several minutes) by informing users

Example use cases

Generating welcome video messages: Create a video of an AI avatar speaking a welcome message, as shown in the example with 'Hello! Welcome to our service.'
Producing personalized avatar videos: Generate short MP4 videos featuring customizable AI avatars and voices speaking provided text for messaging.

FAQs

What is the name of the skill?

avatar-video-messages

Who is the author of the skill?

thewulf7

What does this skill do?

This skill allows the OpenClaw agent to send video messages using an AI avatar. The avatar speaks the message with lip-sync.

What tool does the skill provide?

/avatar_video_message

What parameters does the /avatar_video_message tool require?

message (string, required)

What environment variable is required for this skill?

HEYGEN_API_KEY

What API service does the skill use?

HeyGen

What does the /avatar_video_message tool return?

A publicly accessible URL to the generated video.

More similar skills to explore

achurch
An OpenClaw skill for church administration that handles member management, event scheduling, sermon retrieval, and donation processing. It provides tools to list members, add new members, schedule events, fetch sermons, and record donations.
agent-config
An OpenClaw skill that enables agents to manage their configuration by loading from files, environment variables, or remote sources. It supports retrieving, setting, and validating configuration values. The skill allows for hot-reloading of configurations.
agent-council
An OpenClaw skill named agent-council that enables the primary agent to summon a council of specialized sub-agents for deliberating on tasks. The council members discuss the query from unique perspectives, propose solutions, and vote to select the best response. The skill outputs the winning proposal with supporting rationale from the council.
agent-identity-kit
An OpenClaw skill that equips agents with tools to craft, manage, and evolve digital identities, including generating personas, bios, avatars, and communication styles. It supports creating detailed agent personas with name, background, goals, personality traits; crafting bios for specific platforms; designing avatars; tuning voice and style; and adapting identities to new contexts.
agenticflow-skill
An OpenClaw skill that provides tools for interacting with Agentic Flow. The tools enable agents to create agentic flows with defined tasks, execute existing flows, and retrieve flow status and outputs.
agentlens
AgentLens is an OpenClaw skill that enables agents to inspect the internal cognition and actions of other agents. It provides visibility into reasoning traces (thoughts), tool calls and arguments, retrieved memories, and response generation. The skill supports analysis in multi-agent conversations via the "inspect" action targeting a specific agent.