# AI Voicebot A WebRTC-enabled AI voicebot system with speech recognition and synthetic media capabilities. The voicebot can run in two modes: as a client connecting to lobbies or as a provider serving bots to other applications. ## Features - **Speech Recognition**: Uses Whisper models for real-time audio transcription - **Synthetic Media**: Generates animated video and audio tracks - **WebRTC Integration**: Real-time peer-to-peer communication - **Bot Provider System**: Can register with a main server to provide bot services - **Flexible Deployment**: Docker-based with development and production modes ## Quick Start ### Prerequisites - Docker and Docker Compose - Python 3.12+ (if running locally) - Access to a compatible signaling server ### Running with Docker #### 1. Bot Provider Mode (Recommended) Run the voicebot as a bot provider that registers with the main server: ```bash # Development mode with auto-reload VOICEBOT_MODE=provider PRODUCTION=false docker-compose up voicebot # Production mode VOICEBOT_MODE=provider PRODUCTION=true docker-compose up voicebot ``` #### 2. Direct Client Mode Run the voicebot as a direct client connecting to a lobby: ```bash # Development mode VOICEBOT_MODE=client PRODUCTION=false docker-compose up voicebot # Production mode VOICEBOT_MODE=client PRODUCTION=true docker-compose up voicebot ``` ### Running Locally #### 1. Setup Environment ```bash cd voicebot/ # Create virtual environment uv init --python /usr/bin/python3.12 --name "ai-voicebot-agent" uv add -r requirements.txt # Activate environment source .venv/bin/activate ``` #### 2. Bot Provider Mode ```bash # Development with auto-reload python main.py --mode provider --server-url https://your-server.com/ai-voicebot --reload --insecure # Production python main.py --mode provider --server-url https://your-server.com/ai-voicebot ``` #### 3. Direct Client Mode ```bash python main.py --mode client \ --server-url https://your-server.com/ai-voicebot \ --lobby "my-lobby" \ --session-name "My Bot" \ --insecure ``` ## Configuration ### Environment Variables | Variable | Description | Default | Example | |----------|-------------|---------|---------| | `VOICEBOT_MODE` | Operating mode: `client` or `provider` | `client` | `provider` | | `PRODUCTION` | Production mode flag | `false` | `true` | ### Command Line Arguments #### Common Arguments - `--mode`: Run as `client` or `provider` - `--server-url`: Main server URL - `--insecure`: Allow insecure SSL connections - `--help`: Show all available options #### Provider Mode Arguments - `--host`: Host to bind the provider server (default: `0.0.0.0`) - `--port`: Port for the provider server (default: `8788`) - `--reload`: Enable auto-reload for development #### Client Mode Arguments - `--lobby`: Lobby name to join (default: `default`) - `--session-name`: Display name for the bot (default: `Python Bot`) - `--session-id`: Existing session ID to reuse - `--password`: Password for protected names - `--private`: Create/join private lobby ## Available Bots The voicebot system includes the following bot types: ### 1. Whisper Bot - **Name**: `whisper` - **Description**: Speech recognition agent using OpenAI Whisper models - **Capabilities**: Real-time audio transcription, multiple language support - **Models**: Supports various Whisper and Distil-Whisper models ### 2. Synthetic Media Bot - **Name**: `synthetic_media` - **Description**: Generates animated video and audio tracks - **Capabilities**: Animated video generation, synthetic audio, edge detection on incoming video ## Architecture ### Bot Provider System ``` ┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐ │ Main Server │ │ Bot Provider │ │ Client App │ │ │◄───┤ (Voicebot) │ │ │ │ - Bot Registry │ │ - Whisper Bot │ │ - Bot Manager │ │ - Lobby Management │ - Synthetic Bot │ │ - UI Controls │ │ - API Endpoints │ │ - API Server │ │ - Lobby View │ └─────────────────┘ └──────────────────┘ └─────────────────┘ ``` ### Flow 1. Voicebot registers as bot provider with main server 2. Main server discovers available bots from providers 3. Client requests bot to join lobby via main server 4. Main server forwards request to appropriate provider 5. Provider creates bot instance that connects to the lobby ## Development ### Auto-Reload In development mode, the bot provider supports auto-reload using uvicorn: ```bash # Watches /voicebot and /shared directories for changes python main.py --mode provider --reload ``` ### Adding New Bots 1. Create a new module in `voicebot/bots/` 2. Implement required functions: ```python def agent_info() -> dict: return {"name": "my_bot", "description": "My custom bot"} def create_agent_tracks(session_name: str) -> dict: # Return MediaStreamTrack instances return {"audio": my_audio_track, "video": my_video_track} ``` 3. The bot will be automatically discovered and available ### Testing ```bash # Test bot discovery python test_bot_api.py # Test client connection python main.py --mode client --lobby test --session-name "Test Bot" ``` ## Production Deployment ### Docker Compose ```yaml version: '3.8' services: voicebot-provider: build: . environment: - VOICEBOT_MODE=provider - PRODUCTION=true ports: - "8788:8788" volumes: - ./cache:/voicebot/cache ``` ### Kubernetes ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: voicebot-provider spec: replicas: 1 selector: matchLabels: app: voicebot-provider template: metadata: labels: app: voicebot-provider spec: containers: - name: voicebot image: ai-voicebot:latest env: - name: VOICEBOT_MODE value: "provider" - name: PRODUCTION value: "true" ports: - containerPort: 8788 ``` ## API Reference ### Bot Provider Endpoints The voicebot provider exposes the following HTTP API: - `GET /bots` - List available bots - `POST /bots/{bot_name}/join` - Request bot to join lobby - `GET /bots/runs` - List active bot instances - `POST /bots/runs/{run_id}/stop` - Stop a bot instance ### Example API Usage ```bash # List available bots curl http://localhost:8788/bots # Request whisper bot to join lobby curl -X POST http://localhost:8788/bots/whisper/join \ -H "Content-Type: application/json" \ -d '{ "lobby_id": "lobby-123", "session_id": "session-456", "nick": "Speech Bot", "server_url": "https://server.com/ai-voicebot" }' ``` ## Troubleshooting ### Common Issues **Bot provider not registering:** - Check server URL is correct and accessible - Verify network connectivity between provider and server - Check logs for registration errors **Auto-reload not working:** - Ensure `--reload` flag is used in development - Check file permissions on watched directories - Verify uvicorn version supports reload functionality **WebRTC connection issues:** - Check STUN/TURN server configuration - Verify network ports are not blocked - Check browser console for ICE connection errors ### Logs Logs are written to stdout and include: - Bot registration status - WebRTC connection events - Media track creation/destruction - API request/response details ### Debug Mode Enable verbose logging: ```bash python main.py --mode provider --server-url https://server.com --debug ``` ## Contributing 1. Fork the repository 2. Create a feature branch 3. Make your changes 4. Add tests for new functionality 5. Submit a pull request ## License This project is licensed under the MIT License - see the LICENSE file for details.