Features added: - WebSocket chat message handling in WebRTC signaling client - Bot chat handler discovery and automatic setup - Chat message sending/receiving capabilities - Example chatbot with conversation features - Enhanced whisper bot with chat commands - Comprehensive error handling and logging - Full integration with existing WebRTC infrastructure Bots can now: - Receive chat messages from lobby participants - Send responses back through WebSocket - Process commands and keywords - Integrate seamlessly with voice/video functionality Files modified: - voicebot/webrtc_signaling.py: Added chat message handling - voicebot/bot_orchestrator.py: Enhanced bot discovery for chat - voicebot/bots/whisper.py: Added chat command processing - voicebot/bots/chatbot.py: New conversational bot - voicebot/bots/__init__.py: Added chatbot module - CHAT_INTEGRATION.md: Comprehensive documentation - README.md: Updated with chat functionality info
AI Voicebot
A WebRTC-enabled AI voicebot system with speech recognition and synthetic media capabilities. The voicebot can run in two modes: as a client connecting to lobbies or as a provider serving bots to other applications.
Features
- Speech Recognition: Uses Whisper models for real-time audio transcription
- Synthetic Media: Generates animated video and audio tracks
- WebRTC Integration: Real-time peer-to-peer communication
- Bot Provider System: Can register with a main server to provide bot services
- Flexible Deployment: Docker-based with development and production modes
Quick Start
Prerequisites
- Docker and Docker Compose
- Python 3.12+ (if running locally)
- Access to a compatible signaling server
Running with Docker
1. Bot Provider Mode (Recommended)
Run the voicebot as a bot provider that registers with the main server:
# Development mode with auto-reload
VOICEBOT_MODE=provider PRODUCTION=false docker-compose up voicebot
# Production mode
VOICEBOT_MODE=provider PRODUCTION=true docker-compose up voicebot
2. Direct Client Mode
Run the voicebot as a direct client connecting to a lobby:
# Development mode
VOICEBOT_MODE=client PRODUCTION=false docker-compose up voicebot
# Production mode
VOICEBOT_MODE=client PRODUCTION=true docker-compose up voicebot
Running Locally
1. Setup Environment
cd voicebot/
# Create virtual environment
uv init --python /usr/bin/python3.12 --name "ai-voicebot-agent"
uv add -r requirements.txt
# Activate environment
source .venv/bin/activate
2. Bot Provider Mode
# Development with auto-reload
python main.py --mode provider --server-url https://your-server.com/ai-voicebot --reload --insecure
# Production
python main.py --mode provider --server-url https://your-server.com/ai-voicebot
3. Direct Client Mode
python main.py --mode client \
--server-url https://your-server.com/ai-voicebot \
--lobby "my-lobby" \
--session-name "My Bot" \
--insecure
Configuration
Environment Variables
Variable | Description | Default | Example |
---|---|---|---|
VOICEBOT_MODE |
Operating mode: client or provider |
client |
provider |
PRODUCTION |
Production mode flag | false |
true |
Command Line Arguments
Common Arguments
--mode
: Run asclient
orprovider
--server-url
: Main server URL--insecure
: Allow insecure SSL connections--help
: Show all available options
Provider Mode Arguments
--host
: Host to bind the provider server (default:0.0.0.0
)--port
: Port for the provider server (default:8788
)--reload
: Enable auto-reload for development
Client Mode Arguments
--lobby
: Lobby name to join (default:default
)--session-name
: Display name for the bot (default:Python Bot
)--session-id
: Existing session ID to reuse--password
: Password for protected names--private
: Create/join private lobby
Available Bots
The voicebot system includes the following bot types:
1. Whisper Bot
- Name:
whisper
- Description: Speech recognition agent using OpenAI Whisper models
- Capabilities: Real-time audio transcription, multiple language support
- Models: Supports various Whisper and Distil-Whisper models
2. Synthetic Media Bot
- Name:
synthetic_media
- Description: Generates animated video and audio tracks
- Capabilities: Animated video generation, synthetic audio, edge detection on incoming video
Architecture
Bot Provider System
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Main Server │ │ Bot Provider │ │ Client App │
│ │◄───┤ (Voicebot) │ │ │
│ - Bot Registry │ │ - Whisper Bot │ │ - Bot Manager │
│ - Lobby Management │ - Synthetic Bot │ │ - UI Controls │
│ - API Endpoints │ │ - API Server │ │ - Lobby View │
└─────────────────┘ └──────────────────┘ └─────────────────┘
Flow
- Voicebot registers as bot provider with main server
- Main server discovers available bots from providers
- Client requests bot to join lobby via main server
- Main server forwards request to appropriate provider
- Provider creates bot instance that connects to the lobby
Development
Auto-Reload
In development mode, the bot provider supports auto-reload using uvicorn:
# Watches /voicebot and /shared directories for changes
python main.py --mode provider --reload
Adding New Bots
- Create a new module in
voicebot/bots/
- Implement required functions:
def agent_info() -> dict: return {"name": "my_bot", "description": "My custom bot"} def create_agent_tracks(session_name: str) -> dict: # Return MediaStreamTrack instances return {"audio": my_audio_track, "video": my_video_track}
- The bot will be automatically discovered and available
Testing
# Test bot discovery
python test_bot_api.py
# Test client connection
python main.py --mode client --lobby test --session-name "Test Bot"
Production Deployment
Docker Compose
version: '3.8'
services:
voicebot-provider:
build: .
environment:
- VOICEBOT_MODE=provider
- PRODUCTION=true
ports:
- "8788:8788"
volumes:
- ./cache:/voicebot/cache
Kubernetes
apiVersion: apps/v1
kind: Deployment
metadata:
name: voicebot-provider
spec:
replicas: 1
selector:
matchLabels:
app: voicebot-provider
template:
metadata:
labels:
app: voicebot-provider
spec:
containers:
- name: voicebot
image: ai-voicebot:latest
env:
- name: VOICEBOT_MODE
value: "provider"
- name: PRODUCTION
value: "true"
ports:
- containerPort: 8788
API Reference
Bot Provider Endpoints
The voicebot provider exposes the following HTTP API:
GET /bots
- List available botsPOST /bots/{bot_name}/join
- Request bot to join lobbyGET /bots/runs
- List active bot instancesPOST /bots/runs/{run_id}/stop
- Stop a bot instance
Example API Usage
# List available bots
curl http://localhost:8788/bots
# Request whisper bot to join lobby
curl -X POST http://localhost:8788/bots/whisper/join \
-H "Content-Type: application/json" \
-d '{
"lobby_id": "lobby-123",
"session_id": "session-456",
"nick": "Speech Bot",
"server_url": "https://server.com/ai-voicebot"
}'
Troubleshooting
Common Issues
Bot provider not registering:
- Check server URL is correct and accessible
- Verify network connectivity between provider and server
- Check logs for registration errors
Auto-reload not working:
- Ensure
--reload
flag is used in development - Check file permissions on watched directories
- Verify uvicorn version supports reload functionality
WebRTC connection issues:
- Check STUN/TURN server configuration
- Verify network ports are not blocked
- Check browser console for ICE connection errors
Logs
Logs are written to stdout and include:
- Bot registration status
- WebRTC connection events
- Media track creation/destruction
- API request/response details
Debug Mode
Enable verbose logging:
python main.py --mode provider --server-url https://server.com --debug
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
License
This project is licensed under the MIT License - see the LICENSE file for details.