302 lines
7.9 KiB
Markdown
302 lines
7.9 KiB
Markdown
# AI Voicebot
|
|
|
|
A WebRTC-enabled AI voicebot system with speech recognition and synthetic media capabilities. The voicebot can run in two modes: as a client connecting to lobbies or as a provider serving bots to other applications.
|
|
|
|
## Features
|
|
|
|
- **Speech Recognition**: Uses Whisper models for real-time audio transcription
|
|
- **Synthetic Media**: Generates animated video and audio tracks
|
|
- **WebRTC Integration**: Real-time peer-to-peer communication
|
|
- **Bot Provider System**: Can register with a main server to provide bot services
|
|
- **Flexible Deployment**: Docker-based with development and production modes
|
|
|
|
## Quick Start
|
|
|
|
### Prerequisites
|
|
|
|
- Docker and Docker Compose
|
|
- Python 3.12+ (if running locally)
|
|
- Access to a compatible signaling server
|
|
|
|
### Running with Docker
|
|
|
|
#### 1. Bot Provider Mode (Recommended)
|
|
|
|
Run the voicebot as a bot provider that registers with the main server:
|
|
|
|
```bash
|
|
# Development mode with auto-reload
|
|
VOICEBOT_MODE=provider PRODUCTION=false docker-compose up voicebot
|
|
|
|
# Production mode
|
|
VOICEBOT_MODE=provider PRODUCTION=true docker-compose up voicebot
|
|
```
|
|
|
|
#### 2. Direct Client Mode
|
|
|
|
Run the voicebot as a direct client connecting to a lobby:
|
|
|
|
```bash
|
|
# Development mode
|
|
VOICEBOT_MODE=client PRODUCTION=false docker-compose up voicebot
|
|
|
|
# Production mode
|
|
VOICEBOT_MODE=client PRODUCTION=true docker-compose up voicebot
|
|
```
|
|
|
|
### Running Locally
|
|
|
|
#### 1. Setup Environment
|
|
|
|
```bash
|
|
cd voicebot/
|
|
|
|
# Create virtual environment
|
|
uv init --python /usr/bin/python3.12 --name "ai-voicebot-agent"
|
|
uv add -r requirements.txt
|
|
|
|
# Activate environment
|
|
source .venv/bin/activate
|
|
```
|
|
|
|
#### 2. Bot Provider Mode
|
|
|
|
```bash
|
|
# Development with auto-reload
|
|
python main.py --mode provider --server-url https://your-server.com/ai-voicebot --reload --insecure
|
|
|
|
# Production
|
|
python main.py --mode provider --server-url https://your-server.com/ai-voicebot
|
|
```
|
|
|
|
#### 3. Direct Client Mode
|
|
|
|
```bash
|
|
python main.py --mode client \
|
|
--server-url https://your-server.com/ai-voicebot \
|
|
--lobby "my-lobby" \
|
|
--session-name "My Bot" \
|
|
--insecure
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
|
|
| Variable | Description | Default | Example |
|
|
|----------|-------------|---------|---------|
|
|
| `VOICEBOT_MODE` | Operating mode: `client` or `provider` | `client` | `provider` |
|
|
| `PRODUCTION` | Production mode flag | `false` | `true` |
|
|
|
|
### Command Line Arguments
|
|
|
|
#### Common Arguments
|
|
- `--mode`: Run as `client` or `provider`
|
|
- `--server-url`: Main server URL
|
|
- `--insecure`: Allow insecure SSL connections
|
|
- `--help`: Show all available options
|
|
|
|
#### Provider Mode Arguments
|
|
- `--host`: Host to bind the provider server (default: `0.0.0.0`)
|
|
- `--port`: Port for the provider server (default: `8788`)
|
|
- `--reload`: Enable auto-reload for development
|
|
|
|
#### Client Mode Arguments
|
|
- `--lobby`: Lobby name to join (default: `default`)
|
|
- `--session-name`: Display name for the bot (default: `Python Bot`)
|
|
- `--session-id`: Existing session ID to reuse
|
|
- `--password`: Password for protected names
|
|
- `--private`: Create/join private lobby
|
|
|
|
## Available Bots
|
|
|
|
The voicebot system includes the following bot types:
|
|
|
|
### 1. Whisper Bot
|
|
- **Name**: `whisper`
|
|
- **Description**: Speech recognition agent using OpenAI Whisper models
|
|
- **Capabilities**: Real-time audio transcription, multiple language support
|
|
- **Models**: Supports various Whisper and Distil-Whisper models
|
|
|
|
### 2. Synthetic Media Bot
|
|
- **Name**: `synthetic_media`
|
|
- **Description**: Generates animated video and audio tracks
|
|
- **Capabilities**: Animated video generation, synthetic audio, edge detection on incoming video
|
|
|
|
## Architecture
|
|
|
|
### Bot Provider System
|
|
|
|
```
|
|
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
|
|
│ Main Server │ │ Bot Provider │ │ Client App │
|
|
│ │◄───┤ (Voicebot) │ │ │
|
|
│ - Bot Registry │ │ - Whisper Bot │ │ - Bot Manager │
|
|
│ - Lobby Management │ - Synthetic Bot │ │ - UI Controls │
|
|
│ - API Endpoints │ │ - API Server │ │ - Lobby View │
|
|
└─────────────────┘ └──────────────────┘ └─────────────────┘
|
|
```
|
|
|
|
### Flow
|
|
1. Voicebot registers as bot provider with main server
|
|
2. Main server discovers available bots from providers
|
|
3. Client requests bot to join lobby via main server
|
|
4. Main server forwards request to appropriate provider
|
|
5. Provider creates bot instance that connects to the lobby
|
|
|
|
## Development
|
|
|
|
### Auto-Reload
|
|
|
|
In development mode, the bot provider supports auto-reload using uvicorn:
|
|
|
|
```bash
|
|
# Watches /voicebot and /shared directories for changes
|
|
python main.py --mode provider --reload
|
|
```
|
|
|
|
### Adding New Bots
|
|
|
|
1. Create a new module in `voicebot/bots/`
|
|
2. Implement required functions:
|
|
```python
|
|
def agent_info() -> dict:
|
|
return {"name": "my_bot", "description": "My custom bot"}
|
|
|
|
def create_agent_tracks(session_name: str) -> dict:
|
|
# Return MediaStreamTrack instances
|
|
return {"audio": my_audio_track, "video": my_video_track}
|
|
```
|
|
3. The bot will be automatically discovered and available
|
|
|
|
### Testing
|
|
|
|
```bash
|
|
# Test bot discovery
|
|
python test_bot_api.py
|
|
|
|
# Test client connection
|
|
python main.py --mode client --lobby test --session-name "Test Bot"
|
|
```
|
|
|
|
## Production Deployment
|
|
|
|
### Docker Compose
|
|
|
|
```yaml
|
|
version: '3.8'
|
|
services:
|
|
voicebot-provider:
|
|
build: .
|
|
environment:
|
|
- VOICEBOT_MODE=provider
|
|
- PRODUCTION=true
|
|
ports:
|
|
- "8788:8788"
|
|
volumes:
|
|
- ./cache:/voicebot/cache
|
|
```
|
|
|
|
### Kubernetes
|
|
|
|
```yaml
|
|
apiVersion: apps/v1
|
|
kind: Deployment
|
|
metadata:
|
|
name: voicebot-provider
|
|
spec:
|
|
replicas: 1
|
|
selector:
|
|
matchLabels:
|
|
app: voicebot-provider
|
|
template:
|
|
metadata:
|
|
labels:
|
|
app: voicebot-provider
|
|
spec:
|
|
containers:
|
|
- name: voicebot
|
|
image: ai-voicebot:latest
|
|
env:
|
|
- name: VOICEBOT_MODE
|
|
value: "provider"
|
|
- name: PRODUCTION
|
|
value: "true"
|
|
ports:
|
|
- containerPort: 8788
|
|
```
|
|
|
|
## API Reference
|
|
|
|
### Bot Provider Endpoints
|
|
|
|
The voicebot provider exposes the following HTTP API:
|
|
|
|
- `GET /bots` - List available bots
|
|
- `POST /bots/{bot_name}/join` - Request bot to join lobby
|
|
- `GET /bots/runs` - List active bot instances
|
|
- `POST /bots/runs/{run_id}/stop` - Stop a bot instance
|
|
|
|
### Example API Usage
|
|
|
|
```bash
|
|
# List available bots
|
|
curl http://localhost:8788/bots
|
|
|
|
# Request whisper bot to join lobby
|
|
curl -X POST http://localhost:8788/bots/whisper/join \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"lobby_id": "lobby-123",
|
|
"session_id": "session-456",
|
|
"nick": "Speech Bot",
|
|
"server_url": "https://server.com/ai-voicebot"
|
|
}'
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
**Bot provider not registering:**
|
|
- Check server URL is correct and accessible
|
|
- Verify network connectivity between provider and server
|
|
- Check logs for registration errors
|
|
|
|
**Auto-reload not working:**
|
|
- Ensure `--reload` flag is used in development
|
|
- Check file permissions on watched directories
|
|
- Verify uvicorn version supports reload functionality
|
|
|
|
**WebRTC connection issues:**
|
|
- Check STUN/TURN server configuration
|
|
- Verify network ports are not blocked
|
|
- Check browser console for ICE connection errors
|
|
|
|
### Logs
|
|
|
|
Logs are written to stdout and include:
|
|
- Bot registration status
|
|
- WebRTC connection events
|
|
- Media track creation/destruction
|
|
- API request/response details
|
|
|
|
### Debug Mode
|
|
|
|
Enable verbose logging:
|
|
|
|
```bash
|
|
python main.py --mode provider --server-url https://server.com --debug
|
|
```
|
|
|
|
## Contributing
|
|
|
|
1. Fork the repository
|
|
2. Create a feature branch
|
|
3. Make your changes
|
|
4. Add tests for new functionality
|
|
5. Submit a pull request
|
|
|
|
## License
|
|
|
|
This project is licensed under the MIT License - see the LICENSE file for details. |