Switching to async handle_tool_calls

This commit is contained in:
James Ketr 2025-04-01 18:07:11 -07:00
parent 100e8ea9db
commit 2e5bc651fa
9 changed files with 3183 additions and 633 deletions

View File

@ -1,105 +0,0 @@
# AIRC (pronounced Eric)
AI is Really Cool
This project provides a simple IRC chat client. It runs the neuralchat model, enhanced with a little bit of RAG to fetch news RSS feeds.
Internally, it is built using PyTorch 2.6 and the Intel IPEX/LLM.
NOTE: If running on an Intel Arc A series graphics processor, fp64 is not supported and may need to either be emulated or have the model quantized. It has been a while since I've had an A series GPU to test on, so if you run into problems please file an [issue](https://github.com/jketreno/airc/issues)--I have some routines I can put in, but don't have a way to test them.
# Installation
This project uses docker containers to build. As this was originally written to work on an Intel Arc B580 (Battlemage), it requires a kernel that supports that hardware, such as the one documented at [Intel Graphics Preview](https://github.com/canonical/intel-graphics-preview), which runs in Ubuntu Oracular (24.10)..
NOTE: You need 'docker compose' installed. See [Install Docker Engine on Ubuntu](https://docs.docker.com/engine/install/ubuntu/)
## Want to run under WSL2? No can do...
https://www.intel.com/content/www/us/en/support/articles/000093216/graphics/processor-graphics.html
The A- and B-series discrete GPUs do not support SR-IOV, required for the GPU partitioning that Microsoft Windows uses in order to support GPU acceleration in WSL.
## Building
NOTE: You need 'docker compose' installed. See [Install Docker Engine on Ubuntu](https://docs.docker.com/engine/install/ubuntu/)
```bash
git clone https://github.com/jketreno/airc
cd airc
docker compose build
```
## Running
In order to download the models, you need to have a Hugging Face token. See https://huggingface.co/settings/tokens for information on obtaining a token.
Edit .env to add the following:
```.env
HF_ACCESS_TOKEN=<access token from huggingface>
```
NOTE: Models downloaded by most examples will be placed in the ./cache directory, which is bind mounted to the container.
### AIRC
To launch the airc shell interactively, with the pytorch 2.6 environment loaded, use the default entrypoint to launch a shell:
```bash
docker compose run --rm airc shell
```
Once in the shell, you can then launch the model-server.py and then the airc.py client:
```bash
docker compose run --rm airc shell
src/airc.py --ai-server=http://localhost:5000 &
src/model-server.py
```
By default, src/airc.py will connect to irc.libera.chat on the airc-test channel. See `python src/airc.py --help` for options.
By separating the model-server into its own process, you can develop and tweak the chat backend without losing the IRC connection established by airc.
### Jupyter
```bash
docker compose up jupyter -d
```
The default port for inbound connections is 8888 (see docker-compose.yml). $(pwd)/jupyter is bind mounted to /opt/juypter in the container, which is where notebooks will be saved by default.
To access the jupyter notebook, go to `https://localhost:8888/jupyter`.
### Monitoring
You can run `ze-monitor` within the launched containers to monitor GPU usage.
```bash
containers=($(docker ps --filter "ancestor=airc" --format "{{.ID}}"))
if [[ ${#containers[*]} -eq 0 ]]; then
echo "Running airc container not found."
else
for container in ${containers[@]}; do
echo "Container ${container} devices:"
docker exec -it ${container} ze-monitor
done
fi
```
If an airc container is running, you should see something like:
```
Container 5317c503e771 devices:
Device 1: 8086:A780 (Intel(R) UHD Graphics 770)
Device 2: 8086:E20B (Intel(R) Graphics [0xe20b])
```
You can then launch ze-monitor in that container specifying the device you wish to monitor:
```
containers=($(docker ps --filter "ancestor=airc" --format "{{.ID}}"))
docker exec -it ${containers[0]} ze-monitor --device 2
```

View File

@ -1,279 +0,0 @@
# ze-monitor
A small utility to monitor Level Zero devices via
[Level Zero Sysman](https://oneapi-src.github.io/level-zero-spec/level-zero/latest/sysman/PROG.html#sysman-programming-guide)
from the command line, similar to 'top'.
# Installation
Requires Ubuntu Oracular 24.10.
## Easiest
### Install prerequisites
This will add the [Intel Graphics Preview PPA](https://github.com/canonical/intel-graphics-preview) and install the required dependencies:
```bash
sudo apt-get install -y \
software-properties-common \
&& sudo add-apt-repository -y ppa:kobuk-team/intel-graphics \
&& sudo apt-get update \
&& sudo apt-get install -y \
libze1 libze-intel-gpu1 libncurses6
```
### Install ze-monitor from .deb package
This will download the ze-monitor GitHub, install it, and add the current
user to the 'ze-monitor' group to allow running the utility:
```bash
version=0.3.0-1
wget https://github.com/jketreno/ze-monitor/releases/download/v${version}/ze-monitor-${version}_amd64.deb
sudo dpkg -i ze-monitor-${version}_amd64.deb
sudo usermod -a -G ze-monitor $(whoami)
newgrp ze-monitor
```
Congratulations! You can run ze-monitor:
```bash
ze-monitor
```
You should see something like:
```bash
Device 1: 8086:A780 (Intel(R) UHD Graphics 770)
Device 2: 8086:E20B (Intel(R) Graphics [0xe20b])
```
To monitor a device:
```bash
ze-monitor --device 2
```
Check the docs (`man ze-monitor`) for additional details on running the ze-monitor utility.
## Slightly more involved
This project uses docker containers to build. As this was originally written to monitor an Intel Arc B580 (Battlemage), it requires a kernel that supports that hardware, such as the one documented at [Intel Graphics Preview](https://github.com/canonical/intel-graphics-preview), which runs in Ubuntu Oracular (24.10). It will monitor any Level Zero device, even those using the i915 driver.
NOTE: You need 'docker compose' installed. See [Install Docker Engine on Ubuntu](https://docs.docker.com/engine/install/ubuntu/)
```
git clone https://github.com/jketreno/ze-monitor.git
cd ze-monitor
docker compose build
sudo apt install libze1 libncurses6
version=$(cat src/version.txt)
docker compose run --remove-orphans --rm \
ze-monitor \
cp /opt/ze-monitor-static/build/ze-monitor-${version}_amd64.deb \
/opt/ze-monitor/build
sudo dpkg -i build/ze-monitor-${version}_amd64.deb
```
# Security
In order for ze-monitor to read the performance metric units (PMU) in the Linux kernel, it needs elevated permissions. The easiest way is to install the .deb package and add the user to the ze-monitor group. Or, run under sudo (eg., `sudo ze-monitor ...`.)
The specific capabilities required to monitor the GPU are documented in [Perf Security](https://www.kernel.org/doc/html/v5.1/admin-guide/perf-security.html) and [man capabilities](https://man7.org/linux/man-pages/man7/capabilities.7.html). These include:
| Capability | Reason |
|:--------------------|:-----------------------------------------------------|
| CAP_DAC_READ_SEARCH | Bypass all filesystem read access checks |
| CAP_PERFMON | Access to perf_events (vs. overloaded CAP_SYS_ADMIN) |
| CAP_SYS_PTRACE | PTRACE_MODE_READ_REALCREDS ptrace access mode check |
To configure ze-monitor to run with those privileges, you can use `setcap` to set the correct capabilities on ze-monitor. You can further secure your system by creating a user group specifically for running the utility and restrict running of that command to users in that group. That is what the .deb package does.
If you install the .deb package from a [Release](https://github.com/jketreno/ze-monitor/releases) or by building it, that package will set the appropriate permissions for ze-monitor on installation and set it executable only to those in the 'ze-monitor' group.
## Anyone can run ze-monitor
If you build from source and want to set the capabilities:
```bash
sudo setcap "cap_perfmon,cap_dac_read_search,cap_sys_ptrace=ep" build/ze-monitor
getcap build/ze-monitor
```
Any user can then run `build/ze-monitor` and monitor the GPU.
# Build outside container
## Prerequisites
If you would like to build outside of docker, you need the following packages installed:
```
sudo apt-get install -y \
build-essential \
libfmt-dev \
libncurses-dev
```
In addition, you need the Intel drivers installed, which are available from the `kobuk-team/intel-graphics` PPA:
```
sudo apt-get install -y \
software-properties-common \
&& sudo add-apt-repository -y ppa:kobuk-team/intel-graphics \
&& sudo apt-get update \
&& sudo apt-get install -y \
libze-intel-gpu1 \
libze1 \
libze-dev
```
## Building
```
cd build
cmake ..
make
```
## Running
```
build/ze-monitor
```
## Build and install .deb
In order to build the .deb package, you need the following packages installed:
```bash
sudo apt-get install -y \
debhelper \
devscripts \
rpm \
rpm2cpio
```
You can then build the .deb:
```bash
if [ -d build ]; then
cd build
fi
version=$(cat ../src/version.txt)
cpack
sudo dpkg -i build/packages/ze-monitor_${version}_amd64.deb
```
You can then run ze-monitor from your path:
```bash
ze-monitor
```
# Developing
To run the built binary without building a full .deb package, you can build and run on the host by compiling in the container:
```
docker compose run --rm ze-monitor build.sh
build/ze-monitor
```
The build.sh script will build the binary in /opt/ze-monitor/build, which is volume mounted to the host's build directory.
NOTE: See [Security](#security) for information on running ze-monitor with required kernel access capabilities.
# Running
NOTE: See [Security](#security) for information on running ze-monitor with required kernel access capabilities.
If running within a docker container, the container environment does not have access to the host's `/proc/fd`, which is necessary to obtain information about the processes outside the current container which are using the GPU. As such, only processes running within that container running ze-monitor will be listed as using the GPU.
## List available devices
```
ze-monitor
```
Example output:
```bash
$ ze-monitor
Device 1: 8086:E20B (Intel(R) Graphics [0xe20b])
Device 2: 8086:A780 (Intel(R) UHD Graphics 770)
```
## Show details for a given device
```
sudo ze-monitor --info --device ( PCIID | # | BDF | UUID | /dev/dri/render*)
```
Example output:
```bash
$ sudo ze-monitor --device 2 --info
Device: 8086:A780 (Intel(R) UHD Graphics 770)
UUID: 868080A7-0400-0000-0002-000000000000
BDF: 0000:0000:0002:0000
PCI ID: 8086:A780
Subdevices: 0
Serial Number: unknown
Board Number: unknown
Brand Name: unknown
Model Name: Intel(R) UHD Graphics 770
Vendor Name: Intel(R) Corporation
Driver Version: 0CB7EFCAD5695B7EC5C8CE6
Type: GPU
Is integrated with host: Yes
Is a sub-device: No
Supports error correcting memory: No
Supports on-demand page-faulting: No
Engines: 7
Engine 1: ZES_ENGINE_GROUP_RENDER_SINGLE
Engine 2: ZES_ENGINE_GROUP_MEDIA_DECODE_SINGLE
Engine 3: ZES_ENGINE_GROUP_MEDIA_DECODE_SINGLE
Engine 4: ZES_ENGINE_GROUP_MEDIA_ENCODE_SINGLE
Engine 5: ZES_ENGINE_GROUP_MEDIA_ENCODE_SINGLE
Engine 6: ZES_ENGINE_GROUP_COPY_SINGLE
Engine 7: ZES_ENGINE_GROUP_MEDIA_ENHANCEMENT_SINGLE
Temperature Sensors: 0
```
NOTE: See [Security](#security) for information on running ze-monitor with required kernel access capabilities.
## Monitor a given device
```
sudo ze-monitor --device ( PCIID | # | BDF | UUID | /dev/dri/render* ) \
--interval ms
```
NOTE: See [Security](#security) for information on running ze-monitor with required kernel access capabilities.
Output:
```bash
$ sudo ze-monitor --device 2 --interval 500
Device: 8086:E20B (Intel(R) Graphics [0xe20b])
Total Memory: 12809404416
Free memory: [# 55% ############################ ]
Power usage: 165.0W
------------------------------------------------------------------------------------------
PID COMMAND-LINE
USED MEMORY SHARED MEMORY ENGINE FLAGS
------------------------------------------------------------------------------------------
1 /sbin/init splash
MEM: 106102784 SHR: 100663296 FLAGS: RENDER COMPUTE
1606 /usr/lib/systemd/systemd-logind
MEM: 106102784 SHR: 100663296 FLAGS: RENDER COMPUTE
5164 /usr/bin/gnome-shell
MEM: 530513920 SHR: 503316480 FLAGS: RENDER COMPUTE
5237 /usr/bin/Xwayland :1024 -rootless -nores...isplayfd 6 -initfd 7 -byteswappedclients
MEM: 0 SHR: 0 FLAGS:
40480 python chat.py
MEM: 5544226816 SHR: 0 FLAGS: DMA COMPUTE
```
If you pass `--one-shot`, statistics will be gathered, displayed, and then ze-monitor will exit.

View File

@ -1,56 +0,0 @@
# JAMES KETRENOS
software architect, designer, developer, and team lead
Beaverton, OR 97003
james@ketrenos.com
(503) 501 8281
Seeking an opportunity to contribute to the advancement of energy efficient AI solutions, James is a driven problem solver, solution creator, technical leader, and skilled software developer focused on rapid, high-quality results, with an eye toward bringing solutions to the market.
## SUMMARY
Problem-solving: Trusted resource for executive leadership, able to identify opportunities to bridge technical gaps, adopt new technologies, and improve efficiency and quality for internal and external customers.
Proficient: Adept in compiled and interpreted languages, the software frameworks built around them, and front- and backend infrastructure. Leveraging deep and varied experience to quickly find solutions. Rapidly familiarizes and puts to use new and emerging technologies.
Experienced: 20+ years of experience as an end-to-end Linux software architect, team lead, developer, system administrator, and user. Working with teams to bring together technologies into existing ecosystems for a myriad of technologies.
Leader: Frequent project lead spanning all areas of development and phases of the product life cycle from pre-silicon to post launch support. Capable change agent and mentor, providing technical engineering guidance to multiple teams and organizations.
Communicates: Thrives on helping people solve problems, working to educate others to help them better understand problems and work toward solutions.
## RECENT HISTORY
2024-2025: Present
* Developed 'ze-monitor', a lightweight C++ Linux application leveraging Level Zero Sysman APIs to provide 'top' like device monitoring of Intel GPUs. https://github.com/jketreno/ze-monitor
* Developed 'airc', a LLM pipeline allowing interactive queries about James' resume. Utilizing both in-context and fine-tuned approaches, questions asked about James will use information from his resume and portfolio for answers. Includes a full-stack React web ui, a command line client, and an IRC bot integration. https://github.com/jketreno/airc
2018-2024: Intel® Graphics Software Staff Architect and Lead
* Redefined how Intel approaches graphics enabling on Linux to meet customer and product timelines.
* Spearheaded internal projects to prove out the developer and customer deployment experience when using Intel graphics products with PyTorch, working to ensure all ingredients are available and consumable for success (from kernel driver integration, runtime, framework integration, up to containerized Python workload solution deployment.)
* Focused on improving the customer experience for Intel graphics software for Linux in the data center, high-performance compute clusters, and end users. Worked with several teams and business units to close gaps, improve our software, documentation, and release methodologies.
* Worked with hardware and firmware teams to scope and define architectural solutions for customer features.
1998-2018: Open Source Software Architect and Lead
* Defined software architecture for handheld devices, tablets, Internet of Things, smart appliances, and emerging technologies. Key resource to executive staff to investigate emerging technologies and drive solutions to close existing gaps
* James career at Intel has been diverse. His strongest skills are related to quickly ramping on technologies being utilized in the market, identifying gaps in existing solutions, and working with teams to close those gaps. He excels at adopting and fitting new technology trends as they materialize in the industry.
## PROLONGED HISTORY
The following are technical areas James has been an architect, team lead, and/or individual contributor:
* Linux release infrastructure overhaul: Identified bottlenecks in the CI/CD build pipeline, built proof-of-concept, and moved to production for generating releases of Intel graphics software (https://dgpu-docs.intel.com) as well as internal dashboards and infrastructure for tracking build and release pipelines. JavaScript, HTML, Markdown, RTD, bash/python, Linux packaging, Linux repositories, Linux OS release life cycles, sqlite3. Worked with multiple teams across Intel to meet Intels requirements for public websites as well as to integrate with existing build and validation methodologies while educating teams on tools and infrastructure available from the ecosystem (vs. roll-your-own).
* Board Explorer: Web app targeting developer ecosystem to utilize new single board computers, providing quick access to board details, circuits, and programming information. Delivered as a pure front-end service (no backend required) https://board-explorer.github.io/board-explorer/#quark_mcu_dev_kit_d2000. Tight coordination with UX design team. JavaScript, HTML, CSS, XML, hardware specs, programming specs.
* (internal) Travel Requisition: Internal HTML application and backend enabling internal organizations to request travel approval and a manager front end to track budgetary expenditures in order to determine approval/deny decisions. NodeJS, JavaScript, Polymer, SQL. Tight coordination with internal requirements providers and UX design teams.
* Developer Journey: Web infrastructure allowing engineers to document DIY processes. Front end for parsing, viewing, and following projects. Back end for managing content submitted (extended markdown) including images, videos, and screencasts. Tight coordination with UX design team.
* Robotics: Worked with teams aligning on a ROS (Robot OS) roadmap and alignment. Presented at Embedded Linux conference on the state of open source and robotics. LIDAR, Intel RealSense, opencv, python, C. Developed a robotic vision controlled stewart platform that could play the marble game labyrinth.
* Moblin and MeeGo architect: Focused on overall software architecture as well as moving forward multi-touch and the industry shift to resolution independent applications; all in a time before smart phones as we know them today. Qt, HTML5, EFL.
* Marblin: An HTML/WebGL graphical application simulating the 2D collision physics of marbles in a 3D rendered canvas.
* Linux Kernel: Developed and maintained initial Intel Pro Wireless 2100, 2200, and 3945 drivers in the Linux kernel. C, Software Defined Radios, IEEE 802.11, upstream kernel driver, team lead for team that took over the Intel wireless drivers, internal coordination regarding technical and legal issues surrounding the wireless stack.
* Open source at Intel: Built proof-of-concepts to illustrate to management the potential and opportunities for Intel by embracing open source and Linux.
* Intel Intercast Technology: Team lead for Intel Intercast software for Windows. Worked with 3rd party companies to integrate the technology into their solutions.

View File

@ -1,132 +0,0 @@
# Professional Projects
## 1995 - 1998: Intel Intercast Technology
* OS: Microsoft Windows Application, WinTV
* Languages: C++
* Role: Team lead and software architect
* Microsoft Media infrastructure
* Windows kernel driver work
* Worked with internal teams and external companies to expand compatible hardware and integrate with Windows
* Integration of Internet Explorer via COM embedding into the Intercast Viewer
## 1999 - 2024: Linux evangelist
* One of the initial members of Intel's Open Source Technology Center (OTC)
* Worked across Intel organizational boundaries to educate teams on the benefits and working model of the Linux open source ecosystem
* Deep understanding of licensing issues, political dynamics, community goals, and business needs
* Frequent resource for executive management and teams looking to leverage open source software
## 2000 - 2001: COM on Linux Prototype
* Distributed component object model
* Languages: C++, STL, Flex, Yacc, Bison
* Role: Team lead and architect
* Evaluated key performance differences between Microsoft Component Object Model's (COM) IUnknown (QueryInterface, AddRef, Release) vs. the Component Object Request Broker Architecture (CORBA) for both in-process and distributed cross-process and remote communication.
* Developed prototype tool-chain and functional code providing a Linux compatible implementation of COM
## 1998 - 2000: Intel Dot Station
* Languages: Java, C
* Designed and built a "visual lens" Java plugin for Netscape Navigator
* Role: Software architect
## 2000 - 2002: Carrier Grade Linux
* OS distribution work
* Contributed to the Linux System Base specification
* Role: Team lead and software architect working with internal and external collaborators
## 2004 - 2006: Intel Wireless Linux Kernel Driver
* Languages: C
* Authored original ipw2100, ipw2200, and ipw3945 Linux kernel drivers
* Built IEEE 802.11 wireless subsystem
* Hosted Wireless Birds-of-a-Feather talk at the Ottawa Linux Symposium
* Maintained SourceForge web presence, IRC channel, and community
## 2015 - 2018: Robotics
* Languages: C, Python, NodeJS
* "Maker" blogs on developing a Stewart Platform
*
* Image recognition and tracking
* Presented at Embedded Linux Conference
## 2012 - 2017: RT24 - crosswalk
* Chromium based native web application host
* Role: Team lead and software architect
* Worked with WebGL, Web Assembly, Native Client (NaCl)
* Several internal presentations at various corporate events
## 2007 - 2009: Moblin
* Tablet targetting OS distribution
* Role: Team lead and software architect and requirements
* Technology evaluation: Cairo, EFL, GTK, Clutter
* Languages: C, C++, OpenGL
## 2012 - Web Sys Info
* W3C
* Tizen Working Group
## 2007 - 2017: Marblin
* An interactive graphical stress test of rendering contexts
* Ported to each framework being used for OS development
* Originally written in C and using Clutter, ported to WebGL and EFL
## 2009 - 2011: MeeGo
* The merging of Linux Foundation's Moblin with Nokia's Maemo
* Coordinated and worked across business groups at Intel and Nokia
* Role: Team lead and software architect
* Focused on:
* Resolution independent user interfaces
* Multi-touch enabling in X
* Educated teams on the interface paradigm shift to "mobile first"
* Presented at MeeGo Conference
* Languages: C++, QT, HTML5
## Android on Intel
## 2011 - 2013: Tizen
* Rendering framework: Enlightenment Foundation Library (EFL)
* Focused on: API specifications
* Languages: JavaScript, HTML, C
## Robotics
## Quark
## Board Explorer
## Stewart Platform
## Developer Journey
## Product and Team Tracker
## Travel Tool
## Drones
## Security Mitigations
## 2019 - 2024: Intel Graphics Architect
* Technologies: C, JavaScript, HTML5, React, Markdown, bash, GitHub, GitHub Actions, Docker, Clusters, Data Center, Machine Learning, git
* Role:
* Set strategic direction for working with open source ecosystem
* Worked with hardware and software architects to plan, execute, and support features
* Set strategic direction for overhauling the customer experience for Intel graphics on Linux
# Personal Projects
1995 - 2023: Photo Management Software
* Languages: C, JavaScript, PHP, HTML5, CSS, Polymer, React, SQL
* Role: Personal photo management software, including facial recognition
* Image classification, clustering, and identity
2020 - 2025: Eikona Android App
* OS: Android
* Languages: Java, Expo, React
* Role: Maintainer for Android port
2019 - 2023: Peddlers of Ketran
* Languages: JavaScript, React, NodeJS, HTML5, CSS
* Features: Audio, Video, and Text chat. Full game plus expansions.
* Role: Self-hosted online multiplayer clone of Settlers of Catan
2025: Ze-Monitor
* C++ utility leveraging Level Zero API to monitor GPUs
* https://github.com/jketreno/ze-monitor

File diff suppressed because it is too large Load Diff

View File

@ -19,6 +19,7 @@
"react": "^19.0.0", "react": "^19.0.0",
"react-dom": "^19.0.0", "react-dom": "^19.0.0",
"react-markdown": "^10.1.0", "react-markdown": "^10.1.0",
"react-plotly.js": "^2.6.0",
"react-scripts": "5.0.1", "react-scripts": "5.0.1",
"react-spinners": "^0.15.0", "react-spinners": "^0.15.0",
"rehype-katex": "^7.0.1", "rehype-katex": "^7.0.1",

View File

@ -73,7 +73,6 @@ interface ControlsParams {
systemInfo: SystemInfo, systemInfo: SystemInfo,
toggleTool: (tool: Tool) => void, toggleTool: (tool: Tool) => void,
toggleRag: (tool: Tool) => void, toggleRag: (tool: Tool) => void,
setRags: (rags: Tool[]) => void,
setSystemPrompt: (prompt: string) => void, setSystemPrompt: (prompt: string) => void,
reset: (types: ("rags" | "tools" | "history" | "system-prompt")[], message: string) => Promise<void> reset: (types: ("rags" | "tools" | "history" | "system-prompt")[], message: string) => Promise<void>
}; };
@ -427,8 +426,26 @@ const App = () => {
}, [sessionId, rags, setRags, setSnack, loc]); }, [sessionId, rags, setRags, setSnack, loc]);
const toggleRag = async (tool: Tool) => { const toggleRag = async (tool: Tool) => {
setSnack("RAG is not yet implemented", "warning"); tool.enabled = !tool.enabled
} try {
const response = await fetch(getConnectionBase(loc) + `/api/rags/${sessionId}`, {
method: 'PUT',
headers: {
'Content-Type': 'application/json',
'Accept': 'application/json',
},
body: JSON.stringify({ "tool": tool?.name, "enabled": tool.enabled }),
});
const rags = await response.json();
setRags([...rags])
setSnack(`${tool?.name} ${tool.enabled ? "enabled" : "disabled"}`);
} catch (error) {
console.error('Fetch error:', error);
setSnack(`${tool?.name} ${tool.enabled ? "enabling" : "disabling"} failed.`, "error");
tool.enabled = !tool.enabled
}
};
const toggleTool = async (tool: Tool) => { const toggleTool = async (tool: Tool) => {
tool.enabled = !tool.enabled tool.enabled = !tool.enabled
@ -544,7 +561,7 @@ const App = () => {
const drawer = ( const drawer = (
<> <>
{sessionId !== undefined && systemInfo !== undefined && <Controls {...{ tools, rags, reset, systemPrompt, toggleTool, toggleRag, setRags, setSystemPrompt, systemInfo }} />} {sessionId !== undefined && systemInfo !== undefined && <Controls {...{ tools, rags, reset, systemPrompt, toggleTool, toggleRag, setSystemPrompt, systemInfo }} />}
</> </>
); );
@ -787,17 +804,8 @@ const App = () => {
{drawer} {drawer}
</Drawer> </Drawer>
</Box> </Box>
<Box <Box component="main" sx={{ flexGrow: 1, overflow: 'auto' }} className="ChatBox">
component="main" <Box className="Conversation" sx={{ flexGrow: 2, p: 1 }} ref={conversationRef}>
sx={{
flexGrow: 1,
overflow: 'auto'
}}
className="ChatBox">
<Box className="Conversation"
sx={{ flexGrow: 2, p: 1 }}
ref={conversationRef}>
{conversation.map((message, index) => { {conversation.map((message, index) => {
const formattedContent = message.content.trim(); const formattedContent = message.content.trim();
@ -844,26 +852,26 @@ const App = () => {
/> />
</div> </div>
</Box> </Box>
</Box>
<Box className="Query" sx={{ display: "flex", flexDirection: "row", p: 1 }}> <Box className="Query" sx={{ display: "flex", flexDirection: "row", p: 1 }}>
<TextField <TextField
variant="outlined" variant="outlined"
disabled={processing} disabled={processing}
autoFocus autoFocus
fullWidth fullWidth
type="text" type="text"
value={query} value={query}
onChange={(e) => setQuery(e.target.value)} onChange={(e) => setQuery(e.target.value)}
onKeyDown={handleKeyPress} onKeyDown={handleKeyPress}
placeholder="Enter your question..." placeholder="Enter your question..."
id="QueryInput" id="QueryInput"
/> />
<AccordionActions> <AccordionActions>
<Tooltip title="Send"> <Tooltip title="Send">
<Button sx={{ m: 0 }} variant="contained" onClick={sendQuery}><SendIcon /></Button> <Button sx={{ m: 0 }} variant="contained" onClick={sendQuery}><SendIcon /></Button>
</Tooltip> </Tooltip>
</AccordionActions> </AccordionActions>
</Box>
</Box> </Box>
</Box> </Box>

View File

@ -51,9 +51,11 @@ from bs4 import BeautifulSoup
from fastapi import FastAPI, HTTPException, BackgroundTasks, Request from fastapi import FastAPI, HTTPException, BackgroundTasks, Request
from fastapi.responses import JSONResponse, StreamingResponse, FileResponse, RedirectResponse from fastapi.responses import JSONResponse, StreamingResponse, FileResponse, RedirectResponse
from fastapi.middleware.cors import CORSMiddleware from fastapi.middleware.cors import CORSMiddleware
from utils import rag
from utils import rag as Rag
from tools import ( from tools import (
get_tool_alias,
get_weather_by_location, get_weather_by_location,
get_current_datetime, get_current_datetime,
get_ticker_price, get_ticker_price,
@ -61,11 +63,10 @@ from tools import (
) )
rags = [ rags = [
{ "name": "JPK", "enabled": False, "description": "Expert data about James Ketrenos, including work history, personal hobbies, and projects." }, { "name": "JPK", "enabled": True, "description": "Expert data about James Ketrenos, including work history, personal hobbies, and projects." },
{ "name": "LKML", "enabled": False, "description": "Full associative data for entire LKML mailing list archive." }, # { "name": "LKML", "enabled": False, "description": "Full associative data for entire LKML mailing list archive." },
] ]
def get_installed_ram(): def get_installed_ram():
try: try:
with open('/proc/meminfo', 'r') as f: with open('/proc/meminfo', 'r') as f:
@ -138,7 +139,7 @@ OLLAMA_API_URL = "http://ollama:11434" # Default Ollama local endpoint
#MODEL_NAME = "deepseek-r1:7b" #MODEL_NAME = "deepseek-r1:7b"
#MODEL_NAME = "llama3.2" #MODEL_NAME = "llama3.2"
MODEL_NAME = "qwen2.5:7b" MODEL_NAME = "qwen2.5:7b"
LOG_LEVEL="debug" LOG_LEVEL="info"
USE_TLS=False USE_TLS=False
WEB_HOST="0.0.0.0" WEB_HOST="0.0.0.0"
WEB_PORT=5000 WEB_PORT=5000
@ -295,21 +296,21 @@ async def handle_tool_calls(message):
ret = None ret = None
else: else:
ret = get_ticker_price(ticker) ret = get_ticker_price(ticker)
tools_used.append(f"{tool}({ticker})") tools_used.append(f"{get_tool_alias(tool)}({ticker})")
case 'summarize_site': case 'summarize_site':
url = arguments.get('url'); url = arguments.get('url');
question = arguments.get('question', 'what is the summary of this content?') question = arguments.get('question', 'what is the summary of this content?')
ret = await summarize_site(url, question) ret = await summarize_site(url, question)
tools_used.append(f"{tool}('{url}', '{question}')") tools_used.append(f"{get_tool_alias(tool)}('{url}', '{question}')")
case 'get_current_datetime': case 'get_current_datetime':
tz = arguments.get('timezone') tz = arguments.get('timezone')
ret = get_current_datetime(tz) ret = get_current_datetime(tz)
tools_used.append(f"{tool}('{tz}')") tools_used.append(f"{get_tool_alias(tool)}('{tz}')")
case 'get_weather_by_location': case 'get_weather_by_location':
city = arguments.get('city') city = arguments.get('city')
state = arguments.get('state') state = arguments.get('state')
ret = get_weather_by_location(city, state) ret = get_weather_by_location(city, state)
tools_used.append(f"{tool}('{city}', '{state}')") tools_used.append(f"{get_tool_alias(tool)}('{city}', '{state}')")
case _: case _:
ret = None ret = None
response.append({ response.append({
@ -411,13 +412,14 @@ def llm_tools(tools):
# %% # %%
class WebServer: class WebServer:
def __init__(self, logging, client, model=MODEL_NAME): def __init__(self, logging, client, collection, model=MODEL_NAME):
self.logging = logging self.logging = logging
self.app = FastAPI() self.app = FastAPI()
self.contexts = {} self.contexts = {}
self.client = client self.client = client
self.model = model self.model = model
self.processing = False self.processing = False
self.collection = collection
self.app.add_middleware( self.app.add_middleware(
CORSMiddleware, CORSMiddleware,
@ -451,9 +453,9 @@ class WebServer:
case "system-prompt": case "system-prompt":
context["system"] = [{"role": "system", "content": system_message}] context["system"] = [{"role": "system", "content": system_message}]
response["system-prompt"] = { "system-prompt": system_message } response["system-prompt"] = { "system-prompt": system_message }
case "rag": case "rags":
context["rag"] = rags.copy() context["rags"] = rags.copy()
response["rags"] = context["rag"] response["rags"] = context["rags"]
case "tools": case "tools":
context["tools"] = default_tools(tools) context["tools"] = default_tools(tools)
response["tools"] = context["tools"] response["tools"] = context["tools"]
@ -461,14 +463,13 @@ class WebServer:
context["history"] = [] context["history"] = []
response["history"] = context["history"] response["history"] = context["history"]
if not response: if not response:
return JSONResponse({ "error": "Usage: { reset: rag|tools|history|system-prompt}"}) return JSONResponse({ "error": "Usage: { reset: rags|tools|history|system-prompt}"})
else: else:
self.save_context(context_id) self.save_context(context_id)
return JSONResponse(response) return JSONResponse(response)
except: except:
return JSONResponse({ "error": "Usage: { reset: rag|tools|history|system-prompt}"}) return JSONResponse({ "error": "Usage: { reset: rags|tools|history|system-prompt}"})
@self.app.put('/api/system-prompt/{context_id}') @self.app.put('/api/system-prompt/{context_id}')
async def put_system_prompt(context_id: str, request: Request): async def put_system_prompt(context_id: str, request: Request):
@ -529,7 +530,7 @@ class WebServer:
@self.app.get('/api/history/{context_id}') @self.app.get('/api/history/{context_id}')
async def get_history(context_id: str): async def get_history(context_id: str):
context = self.upsert_context(context_id) context = self.upsert_context(context_id)
return JSONResponse(context["history"]) return JSONResponse(context["ragless_history"])
@self.app.get('/api/tools/{context_id}') @self.app.get('/api/tools/{context_id}')
async def get_tools(context_id: str): async def get_tools(context_id: str):
@ -560,6 +561,26 @@ class WebServer:
context = self.upsert_context(context_id) context = self.upsert_context(context_id)
return JSONResponse(context["rags"]) return JSONResponse(context["rags"])
@self.app.put('/api/rags/{context_id}')
async def put_rags(context_id: str, request: Request):
if not is_valid_uuid(context_id):
logging.warning(f"Invalid context_id: {context_id}")
return JSONResponse({"error": "Invalid context_id"}, status_code=400)
context = self.upsert_context(context_id)
try:
data = await request.json()
modify = data["tool"]
enabled = data["enabled"]
for tool in context["rags"]:
if modify == tool["name"]:
tool["enabled"] = enabled
self.save_context(context_id)
return JSONResponse(context["rags"])
return JSONResponse({ "status": f"{modify} not found in tools." }), 404
except:
return JSONResponse({ "status": "error" }), 405
@self.app.get('/api/health') @self.app.get('/api/health')
async def health_check(): async def health_check():
return JSONResponse({"status": "healthy"}) return JSONResponse({"status": "healthy"})
@ -621,7 +642,6 @@ class WebServer:
return self.contexts[session_id] return self.contexts[session_id]
def create_context(self, context_id = None): def create_context(self, context_id = None):
if not context_id: if not context_id:
context_id = str(uuid.uuid4()) context_id = str(uuid.uuid4())
@ -629,6 +649,7 @@ class WebServer:
"id": context_id, "id": context_id,
"system": [{"role": "system", "content": system_message}], "system": [{"role": "system", "content": system_message}],
"history": [], "history": [],
"ragless_history": [],
"tools": default_tools(tools), "tools": default_tools(tools),
"rags": rags.copy() "rags": rags.copy()
} }
@ -658,20 +679,40 @@ class WebServer:
self.processing = True self.processing = True
history = context["history"]
ragless_history = context["ragless_history"]
rag_used = []
rag_docs = []
for rag in context["rags"]:
if rag["enabled"] and rag["name"] == "JPK": # Only support JPK rag right now...
yield {"status": "processing", "message": f"Checking RAG context {rag['name']}..."}
matches = Rag.find_similar(llm=self.client, collection=self.collection, query=content, top_k=10)
if len(matches):
rag_used.append(rag['name'])
rag_docs.extend(matches)
preamble = ""
if len(rag_docs):
preamble = "Context:\n"
for doc in rag_docs:
preamble += doc
preamble += "\nHuman: "
# Figure
history.append({"role": "user", "content": preamble + content})
ragless_history.append({"role": "user", "content": content})
messages = context["system"] + history[-1:]
try: try:
history = context["history"]
history.append({"role": "user", "content": content})
messages = context["system"] + history[-1:]
#logging.info(messages)
yield {"status": "processing", "message": "Processing request..."} yield {"status": "processing", "message": "Processing request..."}
# Use the async generator in an async for loop # Use the async generator in an async for loop
response = self.client.chat(model=self.model, messages=messages, tools=llm_tools(context["tools"])) response = self.client.chat(model=self.model, messages=messages, tools=llm_tools(context["tools"]))
tools_used = [] tools_used = []
yield {"status": "processing", "message": "Initial response received"} yield {"status": "processing", "message": "Initial response received..."}
if 'tool_calls' in response.get('message', {}): if 'tool_calls' in response.get('message', {}):
yield {"status": "processing", "message": "Processing tool calls..."} yield {"status": "processing", "message": "Processing tool calls..."}
@ -704,7 +745,15 @@ class WebServer:
final_message = {"role": "assistant", "content": reply, 'metadata': {"title": f"🛠️ Tool(s) used: {','.join(tools_used)}"}} final_message = {"role": "assistant", "content": reply, 'metadata': {"title": f"🛠️ Tool(s) used: {','.join(tools_used)}"}}
else: else:
final_message = {"role": "assistant", "content": reply} final_message = {"role": "assistant", "content": reply}
if len(rag_used):
if "metadata" in final_message:
final_message["metadata"]["title"] += f"🔍 RAG(s) used: {','.join(rag_used)}"
else:
final_message["metadata"] = { "title": f"🔍 RAG(s) used: {','.join(rag_used)}" }
history.append(final_message) history.append(final_message)
ragless_history.append(final_message)
yield {"status": "done", "message": final_message} yield {"status": "done", "message": final_message}
except Exception as e: except Exception as e:
@ -732,7 +781,16 @@ def main():
client = ollama.Client(host=args.ollama_server) client = ollama.Client(host=args.ollama_server)
model = args.ollama_model model = args.ollama_model
web_server = WebServer(logging, client, model) documents = Rag.load_text_files("doc")
print(f"Documents loaded {len(documents)}")
collection = Rag.get_vector_collection()
chunks = Rag.create_chunks_from_documents(documents)
Rag.add_embeddings_to_collection(client, collection, chunks)
doc_types = set(chunk.metadata['doc_type'] for chunk in chunks)
print(f"Document types: {doc_types}")
print(f"Vectorstore created with {collection.count()} documents")
web_server = WebServer(logging, client, collection, model)
logging.info(f"Starting web server at http://{args.web_host}:{args.web_port}") logging.info(f"Starting web server at http://{args.web_host}:{args.web_port}")
web_server.run(host=args.web_host, port=args.web_port, use_reloader=False) web_server.run(host=args.web_host, port=args.web_port, use_reloader=False)

View File

@ -1 +1,107 @@
rag = "exists" __all__ = [
'load_text_files',
'create_chunks_from_documents',
'get_vector_collection',
'add_embeddings_to_collection',
'find_similar'
]
import os
import glob
import chromadb
import ollama
from langchain.text_splitter import CharacterTextSplitter
from langchain.schema import Document # Import the Document class
if __name__ == "__main__":
# When running directly, use absolute imports
import defines
else:
# When imported as a module, use relative imports
from . import defines
def load_text_files(directory, encoding="utf-8"):
file_paths = glob.glob(os.path.join(directory, "**/*"), recursive=True)
documents = []
for file_path in file_paths:
if os.path.isfile(file_path): # Ensure it's a file, not a directory
try:
with open(file_path, "r", encoding=encoding) as f:
content = f.read()
# Extract top-level directory
rel_path = os.path.relpath(file_path, directory) # Relative to base directory
top_level_dir = rel_path.split(os.sep)[0] # Get the first directory in the path
documents.append(Document(
page_content=content, # Required format for LangChain
metadata={"doc_type": top_level_dir, "path": file_path}
))
except Exception as e:
print(f"Failed to load {file_path}: {e}")
return documents
def get_vector_collection(path=defines.persist_directory, name="documents"):
# Initialize ChromaDB client
chroma_client = chromadb.PersistentClient(path=path, settings=chromadb.Settings(anonymized_telemetry=False))
# Check if the collection exists and delete it
if os.path.exists(path):
try:
chroma_client.delete_collection(name=name)
except Exception as e:
print(f"Failed to delete existing collection: {e}")
return chroma_client.get_or_create_collection(name=name)
# Function to generate embeddings using Ollama
def get_embedding(llm, text):
response = llm.embeddings(model=defines.model, prompt=text)
return response["embedding"]
def add_embeddings_to_collection(llm, collection, chunks):
# Store documents in ChromaDB
for i, text_or_doc in enumerate(chunks):
# If input is a Document, extract the text content
if isinstance(text_or_doc, Document):
text = text_or_doc.page_content
else:
text = text_or_doc # Assume it's already a string
embedding = get_embedding(llm, text)
collection.add(
ids=[str(i)],
documents=[text],
embeddings=[embedding]
)
def find_similar(llm, collection, query, top_k=3):
query_embedding = get_embedding(llm, query)
results = collection.query(
query_embeddings=[query_embedding],
n_results=top_k
)
return results["documents"][0] # List of top_k matching documents
def create_chunks_from_documents(docs):
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
return text_splitter.split_documents(docs)
if __name__ == "__main__":
# When running directly, use absolute imports
import defines
llm = ollama.Client(host=defines.ollama_api_url)
documents = load_text_files("doc")
print(f"Documents loaded {len(documents)}")
collection = get_vector_collection()
chunks = create_chunks_from_documents(documents)
add_embeddings_to_collection(llm, collection, chunks)
doc_types = set(chunk.metadata['doc_type'] for chunk in chunks)
print(f"Document types: {doc_types}")
print(f"Vectorstore created with {collection.count()} documents")
query = "Can you describe James Ketrenos' work history?"
top_docs = find_similar(llm, query, top_k=3)
print(top_docs)