Go to file

James Ketrenos e0548a128c When backend services (server or voicebot) restart, active frontend UIs become unable to add bots, resulting in:

```
POST https://ketrenos.com/ai-voicebot/api/bots/ai_chatbot/join 404 (Not Found)
```

The issue was caused by three main problems:

1. **Incorrect Provider Registration Check**: The voicebot service was checking provider registration using the wrong API endpoint (`/api/bots` instead of `/api/bots/providers`)

2. **No Persistence for Bot Providers**: Bot providers were stored only in memory and lost on server restart, requiring re-registration

3. **AsyncIO Task Initialization Issue**: The cleanup task was being created during `__init__` when no event loop was running, causing FastAPI route registration failures

**File**: `voicebot/bot_orchestrator.py`

**Problem**: The `check_provider_registration` function was calling `/api/bots` (which returns available bots) instead of `/api/bots/providers` (which returns registered providers).

**Fix**: Updated the function to use the correct endpoint and parse the response properly:

```python
async def check_provider_registration(server_url: str, provider_id: str, insecure: bool = False) -> bool:
    """Check if the bot provider is still registered with the server."""
    try:
        import httpx

        verify = not insecure
        async with httpx.AsyncClient(verify=verify) as client:
            # Check if our provider is still in the provider list
            response = await client.get(f"{server_url}/api/bots/providers", timeout=5.0)
            if response.status_code == 200:
                data = response.json()
                providers = data.get("providers", [])
                # providers is a list of BotProviderModel objects, check if our provider_id is in the list
                is_registered = any(provider.get("provider_id") == provider_id for provider in providers)
                logger.debug(f"Registration check: provider_id={provider_id}, found_providers={len(providers)}, is_registered={is_registered}")
                return is_registered
            else:
                logger.warning(f"Registration check failed: HTTP {response.status_code}")
                return False
    except Exception as e:
        logger.debug(f"Provider registration check failed: {e}")
    return False
```

**File**: `server/core/bot_manager.py`

**Problem**: Bot providers were stored only in memory and lost on server restart.

**Fix**: Added persistence functionality to save/load bot providers to/from `bot_providers.json`:

```python
def _save_bot_providers(self):
    """Save bot providers to disk"""
    try:
        with self.lock:
            providers_data = {}
            for provider_id, provider in self.bot_providers.items():
                providers_data[provider_id] = provider.model_dump()

        with open(self.bot_providers_file, 'w') as f:
            json.dump(providers_data, f, indent=2)
        logger.debug(f"Saved {len(providers_data)} bot providers to {self.bot_providers_file}")
    except Exception as e:
        logger.error(f"Failed to save bot providers: {e}")

def _load_bot_providers(self):
    """Load bot providers from disk"""
    try:
        if not os.path.exists(self.bot_providers_file):
            logger.debug(f"No bot providers file found at {self.bot_providers_file}")
            return

        with open(self.bot_providers_file, 'r') as f:
            providers_data = json.load(f)

        with self.lock:
            for provider_id, provider_dict in providers_data.items():
                try:
                    provider = BotProviderModel.model_validate(provider_dict)
                    self.bot_providers[provider_id] = provider
                except Exception as e:
                    logger.warning(f"Failed to load bot provider {provider_id}: {e}")

        logger.info(f"Loaded {len(self.bot_providers)} bot providers from {self.bot_providers_file}")
    except Exception as e:
        logger.error(f"Failed to load bot providers: {e}")
```

**Integration**: The persistence functions are automatically called:
- `_load_bot_providers()` during `BotManager.__init__()`
- `_save_bot_providers()` when registering new providers or removing stale ones

**File**: `server/core/bot_manager.py`

**Problem**: The cleanup task was being created during `BotManager.__init__()` when no event loop was running, causing the FastAPI application to fail to register routes properly.

**Fix**: Deferred the cleanup task creation until it's actually needed:

```python
def __init__(self):
    # ... other initialization ...
    # Load persisted bot providers
    self._load_bot_providers()

    # Note: Don't start cleanup task here - will be started when needed

def start_cleanup(self):
    """Start the cleanup task"""
    try:
        if self.cleanup_task is None:
            self.cleanup_task = asyncio.create_task(self._periodic_cleanup())
            logger.debug("Bot provider cleanup task started")
    except RuntimeError:
        # No event loop running yet, cleanup will be started later
        logger.debug("No event loop available for bot provider cleanup task")

async def register_provider(self, request: BotProviderRegisterRequest) -> BotProviderRegisterResponse:
    # ... registration logic ...

    # Start cleanup task if not already running
    self.start_cleanup()

    return BotProviderRegisterResponse(provider_id=provider_id)
```

**File**: `server/core/bot_manager.py`

**Enhancement**: Added a background task that periodically removes providers that haven't been seen in 15 minutes:

```python
async def _periodic_cleanup(self):
    """Periodically clean up stale bot providers"""
    cleanup_interval = 300  # 5 minutes
    stale_threshold = 900   # 15 minutes

    while not self._shutdown_event.is_set():
        try:
            await asyncio.sleep(cleanup_interval)

            now = time.time()
            providers_to_remove = []

            with self.lock:
                for provider_id, provider in self.bot_providers.items():
                    if now - provider.last_seen > stale_threshold:
                        providers_to_remove.append(provider_id)
                        logger.info(f"Marking stale bot provider for removal: {provider.name} (ID: {provider_id}, last_seen: {now - provider.last_seen:.1f}s ago)")

            if providers_to_remove:
                with self.lock:
                    for provider_id in providers_to_remove:
                        if provider_id in self.bot_providers:
                            del self.bot_providers[provider_id]

                self._save_bot_providers()
                logger.info(f"Cleaned up {len(providers_to_remove)} stale bot providers")

        except asyncio.CancelledError:
            break
        except Exception as e:
            logger.error(f"Error in bot provider cleanup: {e}")
```

**File**: `client/src/BotManager.tsx`

**Enhancement**: Added retry logic to handle temporary 404s during service restarts:

```typescript
// Retry logic for handling service restart scenarios
let retries = 3;
let response;

while (retries > 0) {
  try {
    response = await botsApi.requestJoinLobby(selectedBot, request);
    break; // Success, exit retry loop
  } catch (err: any) {
    retries--;

    // If it's a 404 error and we have retries left, wait and retry
    if (err?.status === 404 && retries > 0) {
      console.log(`Bot join failed with 404, retrying... (${retries} attempts left)`);
      await new Promise(resolve => setTimeout(resolve, 1000)); // Wait 1 second
      continue;
    }

    // If it's not a 404 or we're out of retries, throw the error
    throw err;
  }
}
```

1. **Persistence**: Bot providers now survive server restarts and don't need to re-register immediately
2. **Correct Registration Checks**: Provider registration checks use the correct API endpoint
3. **Proper AsyncIO Task Management**: Cleanup tasks are started only when an event loop is available
4. **Automatic Cleanup**: Stale providers are automatically removed to prevent accumulation of dead entries
5. **Client Resilience**: Frontend can handle temporary 404s during service restarts with automatic retries
6. **Reduced Downtime**: Users experience fewer failed bot additions during service restarts

After implementing these fixes:

1. Bot providers are correctly persisted in `bot_providers.json`
2. Server restarts load existing providers from disk
3. Provider registration checks use the correct `/api/bots/providers` endpoint
4. AsyncIO cleanup tasks start properly without interfering with route registration
5. Client retries failed requests with 404 errors
6. Periodic cleanup prevents accumulation of stale providers
7. Bot join requests work correctly: `POST /api/bots/{bot_name}/join` returns 200 OK

Test the fix with these commands:

```bash
curl -k https://ketrenos.com/ai-voicebot/api/lobby

curl -k -X POST https://ketrenos.com/ai-voicebot/api/bots/ai_chatbot/join \
  -H "Content-Type: application/json" \
  -d '{"lobby_id":"<lobby_id>","nick":"test-bot","provider_id":"<provider_id>"}'

curl -k https://ketrenos.com/ai-voicebot/api/bots/providers

curl -k https://ketrenos.com/ai-voicebot/api/bots
```

1. `voicebot/bot_orchestrator.py` - Fixed registration check endpoint
2. `server/core/bot_manager.py` - Added persistence and cleanup
3. `client/src/BotManager.tsx` - Added retry logic

No additional configuration is required. The fixes work with existing environment variables and settings.

2025-09-05 12:25:24 -07:00

cache

Adding whisper

2025-09-02 14:47:31 -07:00

client

When backend services (server or voicebot) restart, active frontend UIs become unable to add bots, resulting in:

2025-09-05 12:25:24 -07:00

server

When backend services (server or voicebot) restart, active frontend UIs become unable to add bots, resulting in:

2025-09-05 12:25:24 -07:00

shared

Reworking auto type system

2025-09-05 10:17:40 -07:00

tests

Error handling improvements

2025-09-04 17:14:44 -07:00

voicebot

When backend services (server or voicebot) restart, active frontend UIs become unable to add bots, resulting in:

2025-09-05 12:25:24 -07:00

.dockerignore

Bots now join on demand

2025-09-03 15:51:47 -07:00

.gitignore

Implement comprehensive chat integration for voicebot system

2025-09-03 16:28:32 -07:00

API_EVOLUTION.md

Type conversion completed

2025-09-01 14:48:37 -07:00

ARCHITECTURE_RECOMMENDATIONS.md

Midflight refactoring

2025-09-04 15:50:33 -07:00

AUTOMATED_API_CLIENT.md

Type generation

2025-09-03 13:54:29 -07:00

CHAT_INTEGRATION.md

Implement comprehensive chat integration for voicebot system

2025-09-03 16:28:32 -07:00

check-api-evolution.sh

Fix login issue

2025-09-01 20:34:01 -07:00

clean-venvs

Initial AI agent scaffolding

2025-08-30 12:23:59 -07:00

docker-compose.yml

Lots of tweaks

2025-09-04 19:36:57 -07:00

Dockerfile.client

Adding whisper

2025-09-02 14:47:31 -07:00

Dockerfile.python-3.12

About to restructure voicebot to support dynamic agent loading

2025-09-03 11:19:59 -07:00

Dockerfile.server

Refactor: Create shared Pydantic models for API communication

2025-09-01 13:36:24 -07:00

Dockerfile.voicebot

Refactored voicebot/main.py

2025-09-03 14:33:15 -07:00

generate-ts-types.sh

Type checking working

2025-09-05 11:33:28 -07:00

README.md

Implement comprehensive chat integration for voicebot system

2025-09-03 16:28:32 -07:00

REFACTORING_STEP1_COMPLETE.md

Midflight refactoring

2025-09-04 15:50:33 -07:00

REFACTORING_STEP1_SUCCESS.md

Midflight refactoring

2025-09-04 15:50:33 -07:00

TYPESCRIPT_GENERATION.md

Fix login issue

2025-09-01 20:34:01 -07:00

README.md

AI Voicebot

AI Voicebot is an agentic AI agent that communicates via ICE and TURN running on a coturn server.

coturn provides ICE and related specs:

RFC 5245 - ICE
RFC 5768 – ICE–SIP
RFC 6336 – ICE–IANA Registry
RFC 6544 – ICE–TCP
RFC 5928 - TURN Resolution Mechanism

To use

Set the environment variable COTURN_SERVER to point to the URL running the coturn server by modifying the .env file:

COTURN_SERVER="turns:ketrenos.com:5349"

You then launch the application, providing

Architecture

The system is broken into two major components: client and server

client

The frontend client is written using React, exposed via a static build of the client through the server's static file endpoint.

Implementation of the client is in the client subdirectory.

Provides a Web UI for starting a human chat session. A lobby is created based on the URL, and any user with that URL can join that lobby.

The client uses RTCPeerConnection, RTCSessionDescription, RTCIceCandidate, MediaStream, navigator.getUserMedia, navigator.mediaDevices, and associated APIs for creating audio (via audio tag) and video (via video tag) media instantiations in the Web UI client.

The client also exposes the ability to add new AI "users" to the lobby. When creating a user, you can provide a brief description of the user. The server will use that description to generate an AI person, including profile picture, voice signature used for text-to-speech, etc.

server

The backend server is written in Python and the OpenAI Agentic AI SDK, connecting to an OPENAI compatible server running at OPENAI_BASE_URL.

Implementation of the client is in the server subdirectory.

The model used by the server for LLM communication is set via OPENAI_MODEL. For example:

OPENAI_BASE_URL=http://192.168.1.198:8000/v3
OPENAI_MODEL=Qwen/Qwen3-8B
OPENAI_API_KEY=sk-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

If you want to use OpenAI instead of a self hosted service, do not set OPENAI_BASE_URL and set the OPENAI_API_KEY accordingly.

The server provides the AI chatbot and hosts the static files for the client frontend.

Speech-to-Text and Text-to-Speech Configuration

The server supports pluggable speech-to-text (STT) and text-to-speech (TTS) backends. To configure these, set the following environment variables in your .env file:

STT_MODEL=your-speech-to-text-model
TTS_MODEL=your-text-to-speech-model

These models are used to transcribe incoming audio and synthesize AI responses, respectively. (See future roadmap for planned model support.)

The server communicates with the coturn server in the same manner as the client, only via Python instead.

Chat Integration

The system now includes comprehensive chat functionality that allows bots to send and receive text messages through the WebSocket signaling server. This enables:

Interactive Bots: Bots can respond to text commands and questions
Multi-modal Communication: Users can interact via voice, video, and text
Command Processing: Bots can handle specific commands and provide responses
Seamless Integration: Chat works alongside existing WebRTC functionality

Available Chat-Enabled Bots:

chatbot: Simple conversational bot with greetings, jokes, and time information
whisper: Enhanced speech recognition bot that responds to chat commands

For detailed information about chat implementation and creating chat-enabled bots, see CHAT_INTEGRATION.md.

Bot Provider Configuration

The server supports authenticated bot providers to prevent unauthorized registrations. Bot providers must be configured with specific keys to be allowed to register.

Configuration

Set the following environment variables:

# Server configuration - comma-separated list of allowed provider keys
BOT_PROVIDER_KEYS="key1:Provider Name 1,key2:Provider Name 2"

# Voicebot configuration - key to use when registering
VOICEBOT_PROVIDER_KEY="key1"

Format for BOT_PROVIDER_KEYS:

key1,key2,key3 - Simple keys (names default to keys)
key1:Name1,key2:Name2 - Keys with custom names

Behavior

Authentication Enabled: When BOT_PROVIDER_KEYS is set, only providers with valid keys can register
Authentication Disabled: When BOT_PROVIDER_KEYS is empty or not set, any provider can register
Stale Provider Cleanup: When a provider registers with an existing key, the old provider is automatically removed
Registration Validation: Invalid provider keys are rejected with HTTP 403

Example

BOT_PROVIDER_KEYS="voicebot-main:Main Voicebot,voicebot-dev:Development Bot"
VOICEBOT_PROVIDER_KEY="voicebot-main"

This allows two providers: one for production (voicebot-main) and one for development (voicebot-dev).

shared

The shared/ directory contains shared Pydantic models used for API communication between the server and voicebot components. This ensures type safety and consistency across the entire application.

Key benefits:

Type Safety: All API communications are validated using Pydantic models
Consistency: Both components use identical data structures
Maintainability: Changes to data models only need to be made in one place
Documentation: Models serve as living documentation of the API

The shared models include:

Core data models (lobbies, sessions, participants)
HTTP API request/response models
WebSocket message models
WebRTC signaling models

See shared/README.md for detailed documentation.

API Communication

The server exposes an http endpoint via FastAPI. This endpoint exposes the following capabilities:

Lobby creation
User management within lobby
AI agent creation for a lobby
Connection details for the voice system to attach / detach to audio coturn streams as users join / leave.

Once an AI agent is added to a lobby, it joins the audio stream(s) for that lobby.

Audio input is then passed to the speech-to-text processor to provide a stream of text with time markers.

That text is then passed to the language processing layer of the AI agent, which passes it to the LLM for a response.

The response is then passed through the text-to-speech processor, with the output stream being routed back to coturn server for dispatch to the human UI viewers.

Lobby Features

Player Management: Players can join/leave lobbies, and their status is tracked in real time.
AI and Human Users: Both AI and human users can participate in lobbies. AI users are generated with custom profiles and voices.

Media and Peer Connection Handling

WebRTC Integration: The client uses WebRTC APIs (RTCPeerConnection, RTCSessionDescription, RTCIceCandidate, MediaStream, etc.) to manage real-time audio/video streams between users and AI agents.
Dynamic Peer Management: Peers are dynamically added/removed as users join or leave lobbies. The system handles ICE candidate negotiation, connection state changes, and media stream routing.
Audio/Video UI: Audio and video streams are rendered in the browser using standard HTML media elements.

Extensibility and Planned Enhancements

Pluggable STT/TTS Backends: Support for additional speech-to-text and text-to-speech providers is planned.
Custom AI Agent Personalities: Future versions will allow more detailed customization of AI agent behavior, voice, and appearance.
Improved Moderation and Controls: Features for lobby moderation, user muting, and reporting are under consideration.
Mobile and Accessibility Improvements: Enhanced support for mobile devices and accessibility features is on the roadmap.

Roadmap

Add support for multiple STT/TTS providers
Expand game logic and add new game types
Improve AI agent customization options
Add lobby moderation and user controls
Enhance mobile and accessibility support

Contributions and feature requests are welcome!

Message sequence for WebRTC application

This application provides session management, lobby management, and WebRTC signaling.

Phase 1: Initial Connection & Session Management

Frontend                    Backend
   |                          |
   |----- HTTP Request ------>|  (Initial page load)
   |                          |  Check session cookie
   |                          |  If no cookie -> create new session
   |                          |  If cookie exists -> validate session
   |<---- HTTP Response ------|  Set/update session cookie
   |                          |
   |----- WebSocket Conn ---->|  Upgrade to WebSocket
   |                          |  Associate WebSocket with session
   |<---- session_established-|  { sessionId }

Phase 2: Lobby Management

Creating a Lobby:

Frontend A                  Backend                    
   |                          |                        
   |----- create_lobby ------>|  { lobbyName, settings }
   |                          |  Create lobby instance
   |                          |  Add user to lobby
   |<---- lobby_created ------|  { lobbyId, lobbyInfo }

Joining a Lobby:

Frontend B                  Backend                    Frontend A
   |----- ws:join_lobby ----->|  { lobbyId }             |
   |                          |  Add user to lobby       |
   |<---- ws:lobby_joined ----|  { lobbyInfo }           |
   |<---- ws:lobby_state -----|  { participants: [...] } |
   |                          |                          |
   |                          |--- ws: user_joined ----->|  { newUser }

Phase 3: WebRTC Signaling Initiation

When all required participants are in the lobby, the backend initiates WebRTC negotiation:

Frontend A                  Backend                    Frontend B
   |                          |                          |
   |                          |  Check if conditions     |
   |                          |  are met for WebRTC      |
   |<--- start_webrtc_nego ---|  { participants }        |
   |                          |--- start_webrtc_nego --->|  { participants }
   |                          |                          |
   | Create RTCPeerConnection |                          |  Create RTCPeerConnection
   | Set up local media       |                          |  Set up local media
   |                          |                          |
   |<-- negotiation_needed ---|                          |--- negotiation_needed --->|

Phase 4: WebRTC Offer/Answer Exchange

Frontend A (Initiator)      Backend                    Frontend B (Receiver)
   |                          |                          |
   |  createOffer()           |                          |
   |  setLocalDescription()   |                          |
   |                          |                          |
   |----- webrtc_offer ------>|  { offer, targetUser }   |
   |                          |------ webrtc_offer ----->|
   |                          |                          |  setRemoteDescription()
   |                          |                          |  createAnswer()
   |                          |                          |  setLocalDescription()
   |                          |                          |
   |                          |<----- webrtc_answer -----|  { answer, targetUser }
   |<----- webrtc_answer -----|                          |
   |  setRemoteDescription()  |                          |

Phase 5: ICE Candidate Exchange

Frontend A                  Backend                    Frontend B
   |                          |                          |
   |  ICE gathering starts    |                          |  ICE gathering starts
   |                          |                          |
   |------ ice_candidate ---->|  { candidate, target }   |
   |                          |----- ice_candidate ----->|  addIceCandidate()
   |                          |                          |
   |                          |<---- ice_candidate ------|  { candidate, target }
   |<----- ice_candidate -----|                          |  addIceCandidate()
   |                          |                          |
   |  (Repeat for all ICE candidates collected)          |

Phase 6: Connection Establishment & State Management

Frontend A                  Backend                    Frontend B
   |                          |                          |
   |  onconnectionstatechange |                          |  onconnectionstatechange
   |                          |                          |
   |--- webrtc_state_change ->|  { state: "connecting" } |
   |                          |-- webrtc_state_change -->|  { state: "connecting" }
   |                          |                          |
   |  P2P Connection Established (WebRTC direct)         |
   |<===================== Direct Media Flow ===========>|
   |                          |                          |
   |-- webrtc_state_change -->|  { state: "connected" }  |
   |                          |-- webrtc_state_change -->|  { state: "connected" }
   |                          |                          |
   |<---- connection_ready ---|                          |
   |                          |----- connection_ready -->|

Key Message Types

Session Management:

session_established - Confirms session creation/restoration
session_expired - Session timeout notification

Lobby Management:

create_lobby / lobby_created
join_lobby / lobby_joined
leave_lobby / user_left
lobby_state - Current lobby participants and settings
lobby_destroyed - Lobby cleanup

WebRTC Signaling:

start_webrtc_negotiation - Triggers WebRTC setup
webrtc_offer - SDP offer
webrtc_answer - SDP answer
ice_candidate - ICE candidate exchange
webrtc_state_change - Connection state updates
connection_ready - P2P connection established

Error Handling:

error - Generic error message
lobby_full - Lobby at capacity
webrtc_failed - WebRTC negotiation failure
session_invalid - Session validation failed

Implementation Considerations:

Session Persistence: Store session data in Redis/database for horizontal scaling
Lobby State: Maintain lobby state in memory with periodic persistence
WebSocket Management: Handle reconnections and cleanup properly
WebRTC Timeout: Implement timeouts for offer/answer and ICE gathering
Error Recovery: Graceful fallbacks when WebRTC negotiation fails
Security: Validate session cookies and sanitize all incoming messages

The backend acts as the signaling server, routing WebRTC negotiation messages between peers while managing application state. Once the P2P connection is established, media flows directly between clients, but the WebSocket connection remains for application-level messaging.

Development Tools

Python version

Many of the packages required for ML are not available or with correct versions with Python 3.13 (default for Ubuntu:Plucky.) If your host is running python>3.12 and there aren't any python-3.12 packages available to install, you can use the 'python-3.12' container to build a python3.12 package:

# Build the .deb
docker compose build python-3.12 
# Copy the .deb from the container image to ./build
docker compose run python-3.12
# Install the pkg:
sudo dpkg -i build/python*.deb

TypeScript Type Generation

The project includes automatic TypeScript type generation from the FastAPI OpenAPI schema:

# Generate types and check for API evolution
./generate-ts-types.sh

# Check API evolution without regenerating types
./check-api-evolution.sh

API Evolution Detection

The system automatically detects when new API endpoints are added to the server but not implemented in the TypeScript client:

Automatic Detection: Warnings appear in browser console during development
Command Line Tools: Integrated into the type generation workflow
Implementation Stubs: Provides ready-to-use code templates for new endpoints

See ./API_EVOLUTION.md for detailed documentation.

Languages

Python 74.1%

TypeScript 19.2%

JavaScript 4.6%

CSS 1.1%

Shell 0.8%

Other 0.2%

README.md Unescape Escape

AI Voicebot

To use

Architecture

client

server

Speech-to-Text and Text-to-Speech Configuration

Chat Integration

Bot Provider Configuration

Configuration

Behavior

Example

shared

API Communication

Lobby Features

Media and Peer Connection Handling

Extensibility and Planned Enhancements

Roadmap

Message sequence for WebRTC application

Phase 1: Initial Connection & Session Management

Phase 2: Lobby Management

Creating a Lobby:

Joining a Lobby:

Phase 3: WebRTC Signaling Initiation

Phase 4: WebRTC Offer/Answer Exchange

Phase 5: ICE Candidate Exchange

Phase 6: Connection Establishment & State Management

Key Message Types

Session Management:

Lobby Management:

WebRTC Signaling:

Error Handling:

Implementation Considerations:

Development Tools

Python version

TypeScript Type Generation

API Evolution Detection

README.md