Switching to async handle_tool_calls
This commit is contained in:
parent
100e8ea9db
commit
2e5bc651fa
@ -1,105 +0,0 @@
|
|||||||
# AIRC (pronounced Eric)
|
|
||||||
|
|
||||||
AI is Really Cool
|
|
||||||
|
|
||||||
This project provides a simple IRC chat client. It runs the neuralchat model, enhanced with a little bit of RAG to fetch news RSS feeds.
|
|
||||||
|
|
||||||
Internally, it is built using PyTorch 2.6 and the Intel IPEX/LLM.
|
|
||||||
|
|
||||||
NOTE: If running on an Intel Arc A series graphics processor, fp64 is not supported and may need to either be emulated or have the model quantized. It has been a while since I've had an A series GPU to test on, so if you run into problems please file an [issue](https://github.com/jketreno/airc/issues)--I have some routines I can put in, but don't have a way to test them.
|
|
||||||
|
|
||||||
# Installation
|
|
||||||
|
|
||||||
This project uses docker containers to build. As this was originally written to work on an Intel Arc B580 (Battlemage), it requires a kernel that supports that hardware, such as the one documented at [Intel Graphics Preview](https://github.com/canonical/intel-graphics-preview), which runs in Ubuntu Oracular (24.10)..
|
|
||||||
|
|
||||||
NOTE: You need 'docker compose' installed. See [Install Docker Engine on Ubuntu](https://docs.docker.com/engine/install/ubuntu/)
|
|
||||||
|
|
||||||
## Want to run under WSL2? No can do...
|
|
||||||
|
|
||||||
https://www.intel.com/content/www/us/en/support/articles/000093216/graphics/processor-graphics.html
|
|
||||||
|
|
||||||
The A- and B-series discrete GPUs do not support SR-IOV, required for the GPU partitioning that Microsoft Windows uses in order to support GPU acceleration in WSL.
|
|
||||||
|
|
||||||
## Building
|
|
||||||
|
|
||||||
NOTE: You need 'docker compose' installed. See [Install Docker Engine on Ubuntu](https://docs.docker.com/engine/install/ubuntu/)
|
|
||||||
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git clone https://github.com/jketreno/airc
|
|
||||||
cd airc
|
|
||||||
docker compose build
|
|
||||||
```
|
|
||||||
|
|
||||||
## Running
|
|
||||||
|
|
||||||
In order to download the models, you need to have a Hugging Face token. See https://huggingface.co/settings/tokens for information on obtaining a token.
|
|
||||||
|
|
||||||
Edit .env to add the following:
|
|
||||||
|
|
||||||
```.env
|
|
||||||
HF_ACCESS_TOKEN=<access token from huggingface>
|
|
||||||
```
|
|
||||||
|
|
||||||
NOTE: Models downloaded by most examples will be placed in the ./cache directory, which is bind mounted to the container.
|
|
||||||
|
|
||||||
### AIRC
|
|
||||||
|
|
||||||
To launch the airc shell interactively, with the pytorch 2.6 environment loaded, use the default entrypoint to launch a shell:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
docker compose run --rm airc shell
|
|
||||||
```
|
|
||||||
|
|
||||||
Once in the shell, you can then launch the model-server.py and then the airc.py client:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
docker compose run --rm airc shell
|
|
||||||
src/airc.py --ai-server=http://localhost:5000 &
|
|
||||||
src/model-server.py
|
|
||||||
```
|
|
||||||
|
|
||||||
By default, src/airc.py will connect to irc.libera.chat on the airc-test channel. See `python src/airc.py --help` for options.
|
|
||||||
|
|
||||||
By separating the model-server into its own process, you can develop and tweak the chat backend without losing the IRC connection established by airc.
|
|
||||||
|
|
||||||
### Jupyter
|
|
||||||
|
|
||||||
```bash
|
|
||||||
docker compose up jupyter -d
|
|
||||||
```
|
|
||||||
|
|
||||||
The default port for inbound connections is 8888 (see docker-compose.yml). $(pwd)/jupyter is bind mounted to /opt/juypter in the container, which is where notebooks will be saved by default.
|
|
||||||
|
|
||||||
To access the jupyter notebook, go to `https://localhost:8888/jupyter`.
|
|
||||||
|
|
||||||
### Monitoring
|
|
||||||
|
|
||||||
You can run `ze-monitor` within the launched containers to monitor GPU usage.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
containers=($(docker ps --filter "ancestor=airc" --format "{{.ID}}"))
|
|
||||||
if [[ ${#containers[*]} -eq 0 ]]; then
|
|
||||||
echo "Running airc container not found."
|
|
||||||
else
|
|
||||||
for container in ${containers[@]}; do
|
|
||||||
echo "Container ${container} devices:"
|
|
||||||
docker exec -it ${container} ze-monitor
|
|
||||||
done
|
|
||||||
fi
|
|
||||||
```
|
|
||||||
|
|
||||||
If an airc container is running, you should see something like:
|
|
||||||
|
|
||||||
```
|
|
||||||
Container 5317c503e771 devices:
|
|
||||||
Device 1: 8086:A780 (Intel(R) UHD Graphics 770)
|
|
||||||
Device 2: 8086:E20B (Intel(R) Graphics [0xe20b])
|
|
||||||
```
|
|
||||||
|
|
||||||
You can then launch ze-monitor in that container specifying the device you wish to monitor:
|
|
||||||
|
|
||||||
```
|
|
||||||
containers=($(docker ps --filter "ancestor=airc" --format "{{.ID}}"))
|
|
||||||
docker exec -it ${containers[0]} ze-monitor --device 2
|
|
||||||
```
|
|
@ -1,279 +0,0 @@
|
|||||||
# ze-monitor
|
|
||||||
|
|
||||||
A small utility to monitor Level Zero devices via
|
|
||||||
[Level Zero Sysman](https://oneapi-src.github.io/level-zero-spec/level-zero/latest/sysman/PROG.html#sysman-programming-guide)
|
|
||||||
from the command line, similar to 'top'.
|
|
||||||
|
|
||||||
# Installation
|
|
||||||
|
|
||||||
Requires Ubuntu Oracular 24.10.
|
|
||||||
|
|
||||||
## Easiest
|
|
||||||
|
|
||||||
### Install prerequisites
|
|
||||||
|
|
||||||
This will add the [Intel Graphics Preview PPA](https://github.com/canonical/intel-graphics-preview) and install the required dependencies:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
sudo apt-get install -y \
|
|
||||||
software-properties-common \
|
|
||||||
&& sudo add-apt-repository -y ppa:kobuk-team/intel-graphics \
|
|
||||||
&& sudo apt-get update \
|
|
||||||
&& sudo apt-get install -y \
|
|
||||||
libze1 libze-intel-gpu1 libncurses6
|
|
||||||
```
|
|
||||||
|
|
||||||
### Install ze-monitor from .deb package
|
|
||||||
|
|
||||||
This will download the ze-monitor GitHub, install it, and add the current
|
|
||||||
user to the 'ze-monitor' group to allow running the utility:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
version=0.3.0-1
|
|
||||||
wget https://github.com/jketreno/ze-monitor/releases/download/v${version}/ze-monitor-${version}_amd64.deb
|
|
||||||
sudo dpkg -i ze-monitor-${version}_amd64.deb
|
|
||||||
sudo usermod -a -G ze-monitor $(whoami)
|
|
||||||
newgrp ze-monitor
|
|
||||||
```
|
|
||||||
|
|
||||||
Congratulations! You can run ze-monitor:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
ze-monitor
|
|
||||||
```
|
|
||||||
|
|
||||||
You should see something like:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
Device 1: 8086:A780 (Intel(R) UHD Graphics 770)
|
|
||||||
Device 2: 8086:E20B (Intel(R) Graphics [0xe20b])
|
|
||||||
```
|
|
||||||
|
|
||||||
To monitor a device:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
ze-monitor --device 2
|
|
||||||
```
|
|
||||||
|
|
||||||
Check the docs (`man ze-monitor`) for additional details on running the ze-monitor utility.
|
|
||||||
|
|
||||||
## Slightly more involved
|
|
||||||
|
|
||||||
This project uses docker containers to build. As this was originally written to monitor an Intel Arc B580 (Battlemage), it requires a kernel that supports that hardware, such as the one documented at [Intel Graphics Preview](https://github.com/canonical/intel-graphics-preview), which runs in Ubuntu Oracular (24.10). It will monitor any Level Zero device, even those using the i915 driver.
|
|
||||||
|
|
||||||
NOTE: You need 'docker compose' installed. See [Install Docker Engine on Ubuntu](https://docs.docker.com/engine/install/ubuntu/)
|
|
||||||
|
|
||||||
```
|
|
||||||
git clone https://github.com/jketreno/ze-monitor.git
|
|
||||||
cd ze-monitor
|
|
||||||
docker compose build
|
|
||||||
sudo apt install libze1 libncurses6
|
|
||||||
version=$(cat src/version.txt)
|
|
||||||
docker compose run --remove-orphans --rm \
|
|
||||||
ze-monitor \
|
|
||||||
cp /opt/ze-monitor-static/build/ze-monitor-${version}_amd64.deb \
|
|
||||||
/opt/ze-monitor/build
|
|
||||||
sudo dpkg -i build/ze-monitor-${version}_amd64.deb
|
|
||||||
```
|
|
||||||
|
|
||||||
# Security
|
|
||||||
|
|
||||||
In order for ze-monitor to read the performance metric units (PMU) in the Linux kernel, it needs elevated permissions. The easiest way is to install the .deb package and add the user to the ze-monitor group. Or, run under sudo (eg., `sudo ze-monitor ...`.)
|
|
||||||
|
|
||||||
The specific capabilities required to monitor the GPU are documented in [Perf Security](https://www.kernel.org/doc/html/v5.1/admin-guide/perf-security.html) and [man capabilities](https://man7.org/linux/man-pages/man7/capabilities.7.html). These include:
|
|
||||||
|
|
||||||
| Capability | Reason |
|
|
||||||
|:--------------------|:-----------------------------------------------------|
|
|
||||||
| CAP_DAC_READ_SEARCH | Bypass all filesystem read access checks |
|
|
||||||
| CAP_PERFMON | Access to perf_events (vs. overloaded CAP_SYS_ADMIN) |
|
|
||||||
| CAP_SYS_PTRACE | PTRACE_MODE_READ_REALCREDS ptrace access mode check |
|
|
||||||
|
|
||||||
To configure ze-monitor to run with those privileges, you can use `setcap` to set the correct capabilities on ze-monitor. You can further secure your system by creating a user group specifically for running the utility and restrict running of that command to users in that group. That is what the .deb package does.
|
|
||||||
|
|
||||||
If you install the .deb package from a [Release](https://github.com/jketreno/ze-monitor/releases) or by building it, that package will set the appropriate permissions for ze-monitor on installation and set it executable only to those in the 'ze-monitor' group.
|
|
||||||
|
|
||||||
## Anyone can run ze-monitor
|
|
||||||
|
|
||||||
If you build from source and want to set the capabilities:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
sudo setcap "cap_perfmon,cap_dac_read_search,cap_sys_ptrace=ep" build/ze-monitor
|
|
||||||
getcap build/ze-monitor
|
|
||||||
```
|
|
||||||
|
|
||||||
Any user can then run `build/ze-monitor` and monitor the GPU.
|
|
||||||
|
|
||||||
# Build outside container
|
|
||||||
|
|
||||||
## Prerequisites
|
|
||||||
|
|
||||||
If you would like to build outside of docker, you need the following packages installed:
|
|
||||||
|
|
||||||
```
|
|
||||||
sudo apt-get install -y \
|
|
||||||
build-essential \
|
|
||||||
libfmt-dev \
|
|
||||||
libncurses-dev
|
|
||||||
```
|
|
||||||
|
|
||||||
In addition, you need the Intel drivers installed, which are available from the `kobuk-team/intel-graphics` PPA:
|
|
||||||
|
|
||||||
```
|
|
||||||
sudo apt-get install -y \
|
|
||||||
software-properties-common \
|
|
||||||
&& sudo add-apt-repository -y ppa:kobuk-team/intel-graphics \
|
|
||||||
&& sudo apt-get update \
|
|
||||||
&& sudo apt-get install -y \
|
|
||||||
libze-intel-gpu1 \
|
|
||||||
libze1 \
|
|
||||||
libze-dev
|
|
||||||
```
|
|
||||||
## Building
|
|
||||||
|
|
||||||
```
|
|
||||||
cd build
|
|
||||||
cmake ..
|
|
||||||
make
|
|
||||||
```
|
|
||||||
|
|
||||||
## Running
|
|
||||||
|
|
||||||
```
|
|
||||||
build/ze-monitor
|
|
||||||
```
|
|
||||||
|
|
||||||
## Build and install .deb
|
|
||||||
|
|
||||||
In order to build the .deb package, you need the following packages installed:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
sudo apt-get install -y \
|
|
||||||
debhelper \
|
|
||||||
devscripts \
|
|
||||||
rpm \
|
|
||||||
rpm2cpio
|
|
||||||
```
|
|
||||||
|
|
||||||
You can then build the .deb:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
if [ -d build ]; then
|
|
||||||
cd build
|
|
||||||
fi
|
|
||||||
version=$(cat ../src/version.txt)
|
|
||||||
cpack
|
|
||||||
sudo dpkg -i build/packages/ze-monitor_${version}_amd64.deb
|
|
||||||
```
|
|
||||||
|
|
||||||
You can then run ze-monitor from your path:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
ze-monitor
|
|
||||||
```
|
|
||||||
|
|
||||||
# Developing
|
|
||||||
|
|
||||||
To run the built binary without building a full .deb package, you can build and run on the host by compiling in the container:
|
|
||||||
|
|
||||||
```
|
|
||||||
docker compose run --rm ze-monitor build.sh
|
|
||||||
build/ze-monitor
|
|
||||||
```
|
|
||||||
|
|
||||||
The build.sh script will build the binary in /opt/ze-monitor/build, which is volume mounted to the host's build directory.
|
|
||||||
|
|
||||||
NOTE: See [Security](#security) for information on running ze-monitor with required kernel access capabilities.
|
|
||||||
|
|
||||||
# Running
|
|
||||||
|
|
||||||
NOTE: See [Security](#security) for information on running ze-monitor with required kernel access capabilities.
|
|
||||||
|
|
||||||
If running within a docker container, the container environment does not have access to the host's `/proc/fd`, which is necessary to obtain information about the processes outside the current container which are using the GPU. As such, only processes running within that container running ze-monitor will be listed as using the GPU.
|
|
||||||
|
|
||||||
## List available devices
|
|
||||||
|
|
||||||
```
|
|
||||||
ze-monitor
|
|
||||||
```
|
|
||||||
|
|
||||||
Example output:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
$ ze-monitor
|
|
||||||
Device 1: 8086:E20B (Intel(R) Graphics [0xe20b])
|
|
||||||
Device 2: 8086:A780 (Intel(R) UHD Graphics 770)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Show details for a given device
|
|
||||||
|
|
||||||
```
|
|
||||||
sudo ze-monitor --info --device ( PCIID | # | BDF | UUID | /dev/dri/render*)
|
|
||||||
```
|
|
||||||
|
|
||||||
Example output:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
$ sudo ze-monitor --device 2 --info
|
|
||||||
Device: 8086:A780 (Intel(R) UHD Graphics 770)
|
|
||||||
UUID: 868080A7-0400-0000-0002-000000000000
|
|
||||||
BDF: 0000:0000:0002:0000
|
|
||||||
PCI ID: 8086:A780
|
|
||||||
Subdevices: 0
|
|
||||||
Serial Number: unknown
|
|
||||||
Board Number: unknown
|
|
||||||
Brand Name: unknown
|
|
||||||
Model Name: Intel(R) UHD Graphics 770
|
|
||||||
Vendor Name: Intel(R) Corporation
|
|
||||||
Driver Version: 0CB7EFCAD5695B7EC5C8CE6
|
|
||||||
Type: GPU
|
|
||||||
Is integrated with host: Yes
|
|
||||||
Is a sub-device: No
|
|
||||||
Supports error correcting memory: No
|
|
||||||
Supports on-demand page-faulting: No
|
|
||||||
Engines: 7
|
|
||||||
Engine 1: ZES_ENGINE_GROUP_RENDER_SINGLE
|
|
||||||
Engine 2: ZES_ENGINE_GROUP_MEDIA_DECODE_SINGLE
|
|
||||||
Engine 3: ZES_ENGINE_GROUP_MEDIA_DECODE_SINGLE
|
|
||||||
Engine 4: ZES_ENGINE_GROUP_MEDIA_ENCODE_SINGLE
|
|
||||||
Engine 5: ZES_ENGINE_GROUP_MEDIA_ENCODE_SINGLE
|
|
||||||
Engine 6: ZES_ENGINE_GROUP_COPY_SINGLE
|
|
||||||
Engine 7: ZES_ENGINE_GROUP_MEDIA_ENHANCEMENT_SINGLE
|
|
||||||
Temperature Sensors: 0
|
|
||||||
```
|
|
||||||
|
|
||||||
NOTE: See [Security](#security) for information on running ze-monitor with required kernel access capabilities.
|
|
||||||
|
|
||||||
## Monitor a given device
|
|
||||||
|
|
||||||
```
|
|
||||||
sudo ze-monitor --device ( PCIID | # | BDF | UUID | /dev/dri/render* ) \
|
|
||||||
--interval ms
|
|
||||||
```
|
|
||||||
|
|
||||||
NOTE: See [Security](#security) for information on running ze-monitor with required kernel access capabilities.
|
|
||||||
|
|
||||||
Output:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
$ sudo ze-monitor --device 2 --interval 500
|
|
||||||
Device: 8086:E20B (Intel(R) Graphics [0xe20b])
|
|
||||||
Total Memory: 12809404416
|
|
||||||
Free memory: [# 55% ############################ ]
|
|
||||||
Power usage: 165.0W
|
|
||||||
------------------------------------------------------------------------------------------
|
|
||||||
PID COMMAND-LINE
|
|
||||||
USED MEMORY SHARED MEMORY ENGINE FLAGS
|
|
||||||
------------------------------------------------------------------------------------------
|
|
||||||
1 /sbin/init splash
|
|
||||||
MEM: 106102784 SHR: 100663296 FLAGS: RENDER COMPUTE
|
|
||||||
1606 /usr/lib/systemd/systemd-logind
|
|
||||||
MEM: 106102784 SHR: 100663296 FLAGS: RENDER COMPUTE
|
|
||||||
5164 /usr/bin/gnome-shell
|
|
||||||
MEM: 530513920 SHR: 503316480 FLAGS: RENDER COMPUTE
|
|
||||||
5237 /usr/bin/Xwayland :1024 -rootless -nores...isplayfd 6 -initfd 7 -byteswappedclients
|
|
||||||
MEM: 0 SHR: 0 FLAGS:
|
|
||||||
40480 python chat.py
|
|
||||||
MEM: 5544226816 SHR: 0 FLAGS: DMA COMPUTE
|
|
||||||
```
|
|
||||||
|
|
||||||
If you pass `--one-shot`, statistics will be gathered, displayed, and then ze-monitor will exit.
|
|
@ -1,56 +0,0 @@
|
|||||||
# JAMES KETRENOS
|
|
||||||
software architect, designer, developer, and team lead
|
|
||||||
Beaverton, OR 97003
|
|
||||||
|
|
||||||
james@ketrenos.com
|
|
||||||
(503) 501 8281
|
|
||||||
|
|
||||||
Seeking an opportunity to contribute to the advancement of energy efficient AI solutions, James is a driven problem solver, solution creator, technical leader, and skilled software developer focused on rapid, high-quality results, with an eye toward bringing solutions to the market.
|
|
||||||
|
|
||||||
## SUMMARY
|
|
||||||
|
|
||||||
Problem-solving: Trusted resource for executive leadership, able to identify opportunities to bridge technical gaps, adopt new technologies, and improve efficiency and quality for internal and external customers.
|
|
||||||
|
|
||||||
Proficient: Adept in compiled and interpreted languages, the software frameworks built around them, and front- and backend infrastructure. Leveraging deep and varied experience to quickly find solutions. Rapidly familiarizes and puts to use new and emerging technologies.
|
|
||||||
|
|
||||||
Experienced: 20+ years of experience as an end-to-end Linux software architect, team lead, developer, system administrator, and user. Working with teams to bring together technologies into existing ecosystems for a myriad of technologies.
|
|
||||||
|
|
||||||
Leader: Frequent project lead spanning all areas of development and phases of the product life cycle from pre-silicon to post launch support. Capable change agent and mentor, providing technical engineering guidance to multiple teams and organizations.
|
|
||||||
|
|
||||||
Communicates: Thrives on helping people solve problems, working to educate others to help them better understand problems and work toward solutions.
|
|
||||||
|
|
||||||
## RECENT HISTORY
|
|
||||||
|
|
||||||
2024-2025: Present
|
|
||||||
|
|
||||||
* Developed 'ze-monitor', a lightweight C++ Linux application leveraging Level Zero Sysman APIs to provide 'top' like device monitoring of Intel GPUs. https://github.com/jketreno/ze-monitor
|
|
||||||
* Developed 'airc', a LLM pipeline allowing interactive queries about James' resume. Utilizing both in-context and fine-tuned approaches, questions asked about James will use information from his resume and portfolio for answers. Includes a full-stack React web ui, a command line client, and an IRC bot integration. https://github.com/jketreno/airc
|
|
||||||
|
|
||||||
2018-2024: Intel® Graphics Software Staff Architect and Lead
|
|
||||||
|
|
||||||
* Redefined how Intel approaches graphics enabling on Linux to meet customer and product timelines.
|
|
||||||
* Spearheaded internal projects to prove out the developer and customer deployment experience when using Intel graphics products with PyTorch, working to ensure all ingredients are available and consumable for success (from kernel driver integration, runtime, framework integration, up to containerized Python workload solution deployment.)
|
|
||||||
* Focused on improving the customer experience for Intel graphics software for Linux in the data center, high-performance compute clusters, and end users. Worked with several teams and business units to close gaps, improve our software, documentation, and release methodologies.
|
|
||||||
* Worked with hardware and firmware teams to scope and define architectural solutions for customer features.
|
|
||||||
|
|
||||||
1998-2018: Open Source Software Architect and Lead
|
|
||||||
|
|
||||||
* Defined software architecture for handheld devices, tablets, Internet of Things, smart appliances, and emerging technologies. Key resource to executive staff to investigate emerging technologies and drive solutions to close existing gaps
|
|
||||||
* James career at Intel has been diverse. His strongest skills are related to quickly ramping on technologies being utilized in the market, identifying gaps in existing solutions, and working with teams to close those gaps. He excels at adopting and fitting new technology trends as they materialize in the industry.
|
|
||||||
|
|
||||||
## PROLONGED HISTORY
|
|
||||||
|
|
||||||
The following are technical areas James has been an architect, team lead, and/or individual contributor:
|
|
||||||
|
|
||||||
* Linux release infrastructure overhaul: Identified bottlenecks in the CI/CD build pipeline, built proof-of-concept, and moved to production for generating releases of Intel graphics software (https://dgpu-docs.intel.com) as well as internal dashboards and infrastructure for tracking build and release pipelines. JavaScript, HTML, Markdown, RTD, bash/python, Linux packaging, Linux repositories, Linux OS release life cycles, sqlite3. Worked with multiple teams across Intel to meet Intel’s requirements for public websites as well as to integrate with existing build and validation methodologies while educating teams on tools and infrastructure available from the ecosystem (vs. roll-your-own).
|
|
||||||
* Board Explorer: Web app targeting developer ecosystem to utilize new single board computers, providing quick access to board details, circuits, and programming information. Delivered as a pure front-end service (no backend required) https://board-explorer.github.io/board-explorer/#quark_mcu_dev_kit_d2000. Tight coordination with UX design team. JavaScript, HTML, CSS, XML, hardware specs, programming specs.
|
|
||||||
* (internal) Travel Requisition: Internal HTML application and backend enabling internal organizations to request travel approval and a manager front end to track budgetary expenditures in order to determine approval/deny decisions. NodeJS, JavaScript, Polymer, SQL. Tight coordination with internal requirements providers and UX design teams.
|
|
||||||
* Developer Journey: Web infrastructure allowing engineers to document DIY processes. Front end for parsing, viewing, and following projects. Back end for managing content submitted (extended markdown) including images, videos, and screencasts. Tight coordination with UX design team.
|
|
||||||
* Robotics: Worked with teams aligning on a ROS (Robot OS) roadmap and alignment. Presented at Embedded Linux conference on the state of open source and robotics. LIDAR, Intel RealSense, opencv, python, C. Developed a robotic vision controlled stewart platform that could play the marble game labyrinth.
|
|
||||||
* Moblin and MeeGo architect: Focused on overall software architecture as well as moving forward multi-touch and the industry shift to resolution independent applications; all in a time before smart phones as we know them today. Qt, HTML5, EFL.
|
|
||||||
* Marblin: An HTML/WebGL graphical application simulating the 2D collision physics of marbles in a 3D rendered canvas.
|
|
||||||
* Linux Kernel: Developed and maintained initial Intel Pro Wireless 2100, 2200, and 3945 drivers in the Linux kernel. C, Software Defined Radios, IEEE 802.11, upstream kernel driver, team lead for team that took over the Intel wireless drivers, internal coordination regarding technical and legal issues surrounding the wireless stack.
|
|
||||||
* Open source at Intel: Built proof-of-concepts to illustrate to management the potential and opportunities for Intel by embracing open source and Linux.
|
|
||||||
* Intel Intercast Technology: Team lead for Intel Intercast software for Windows. Worked with 3rd party companies to integrate the technology into their solutions.
|
|
||||||
|
|
||||||
|
|
@ -1,132 +0,0 @@
|
|||||||
# Professional Projects
|
|
||||||
|
|
||||||
## 1995 - 1998: Intel Intercast Technology
|
|
||||||
* OS: Microsoft Windows Application, WinTV
|
|
||||||
* Languages: C++
|
|
||||||
* Role: Team lead and software architect
|
|
||||||
* Microsoft Media infrastructure
|
|
||||||
* Windows kernel driver work
|
|
||||||
* Worked with internal teams and external companies to expand compatible hardware and integrate with Windows
|
|
||||||
* Integration of Internet Explorer via COM embedding into the Intercast Viewer
|
|
||||||
|
|
||||||
## 1999 - 2024: Linux evangelist
|
|
||||||
* One of the initial members of Intel's Open Source Technology Center (OTC)
|
|
||||||
* Worked across Intel organizational boundaries to educate teams on the benefits and working model of the Linux open source ecosystem
|
|
||||||
* Deep understanding of licensing issues, political dynamics, community goals, and business needs
|
|
||||||
* Frequent resource for executive management and teams looking to leverage open source software
|
|
||||||
|
|
||||||
## 2000 - 2001: COM on Linux Prototype
|
|
||||||
* Distributed component object model
|
|
||||||
* Languages: C++, STL, Flex, Yacc, Bison
|
|
||||||
* Role: Team lead and architect
|
|
||||||
* Evaluated key performance differences between Microsoft Component Object Model's (COM) IUnknown (QueryInterface, AddRef, Release) vs. the Component Object Request Broker Architecture (CORBA) for both in-process and distributed cross-process and remote communication.
|
|
||||||
* Developed prototype tool-chain and functional code providing a Linux compatible implementation of COM
|
|
||||||
|
|
||||||
## 1998 - 2000: Intel Dot Station
|
|
||||||
* Languages: Java, C
|
|
||||||
* Designed and built a "visual lens" Java plugin for Netscape Navigator
|
|
||||||
* Role: Software architect
|
|
||||||
|
|
||||||
## 2000 - 2002: Carrier Grade Linux
|
|
||||||
* OS distribution work
|
|
||||||
* Contributed to the Linux System Base specification
|
|
||||||
* Role: Team lead and software architect working with internal and external collaborators
|
|
||||||
|
|
||||||
## 2004 - 2006: Intel Wireless Linux Kernel Driver
|
|
||||||
* Languages: C
|
|
||||||
* Authored original ipw2100, ipw2200, and ipw3945 Linux kernel drivers
|
|
||||||
* Built IEEE 802.11 wireless subsystem
|
|
||||||
* Hosted Wireless Birds-of-a-Feather talk at the Ottawa Linux Symposium
|
|
||||||
* Maintained SourceForge web presence, IRC channel, and community
|
|
||||||
|
|
||||||
## 2015 - 2018: Robotics
|
|
||||||
* Languages: C, Python, NodeJS
|
|
||||||
* "Maker" blogs on developing a Stewart Platform
|
|
||||||
*
|
|
||||||
* Image recognition and tracking
|
|
||||||
* Presented at Embedded Linux Conference
|
|
||||||
|
|
||||||
## 2012 - 2017: RT24 - crosswalk
|
|
||||||
* Chromium based native web application host
|
|
||||||
* Role: Team lead and software architect
|
|
||||||
* Worked with WebGL, Web Assembly, Native Client (NaCl)
|
|
||||||
* Several internal presentations at various corporate events
|
|
||||||
|
|
||||||
## 2007 - 2009: Moblin
|
|
||||||
* Tablet targetting OS distribution
|
|
||||||
* Role: Team lead and software architect and requirements
|
|
||||||
* Technology evaluation: Cairo, EFL, GTK, Clutter
|
|
||||||
* Languages: C, C++, OpenGL
|
|
||||||
|
|
||||||
## 2012 - Web Sys Info
|
|
||||||
* W3C
|
|
||||||
* Tizen Working Group
|
|
||||||
|
|
||||||
## 2007 - 2017: Marblin
|
|
||||||
* An interactive graphical stress test of rendering contexts
|
|
||||||
* Ported to each framework being used for OS development
|
|
||||||
* Originally written in C and using Clutter, ported to WebGL and EFL
|
|
||||||
|
|
||||||
## 2009 - 2011: MeeGo
|
|
||||||
* The merging of Linux Foundation's Moblin with Nokia's Maemo
|
|
||||||
* Coordinated and worked across business groups at Intel and Nokia
|
|
||||||
* Role: Team lead and software architect
|
|
||||||
* Focused on:
|
|
||||||
* Resolution independent user interfaces
|
|
||||||
* Multi-touch enabling in X
|
|
||||||
* Educated teams on the interface paradigm shift to "mobile first"
|
|
||||||
* Presented at MeeGo Conference
|
|
||||||
* Languages: C++, QT, HTML5
|
|
||||||
|
|
||||||
## Android on Intel
|
|
||||||
|
|
||||||
## 2011 - 2013: Tizen
|
|
||||||
* Rendering framework: Enlightenment Foundation Library (EFL)
|
|
||||||
* Focused on: API specifications
|
|
||||||
* Languages: JavaScript, HTML, C
|
|
||||||
|
|
||||||
## Robotics
|
|
||||||
|
|
||||||
## Quark
|
|
||||||
|
|
||||||
## Board Explorer
|
|
||||||
|
|
||||||
## Stewart Platform
|
|
||||||
|
|
||||||
## Developer Journey
|
|
||||||
|
|
||||||
## Product and Team Tracker
|
|
||||||
|
|
||||||
## Travel Tool
|
|
||||||
|
|
||||||
## Drones
|
|
||||||
|
|
||||||
## Security Mitigations
|
|
||||||
|
|
||||||
## 2019 - 2024: Intel Graphics Architect
|
|
||||||
* Technologies: C, JavaScript, HTML5, React, Markdown, bash, GitHub, GitHub Actions, Docker, Clusters, Data Center, Machine Learning, git
|
|
||||||
* Role:
|
|
||||||
* Set strategic direction for working with open source ecosystem
|
|
||||||
* Worked with hardware and software architects to plan, execute, and support features
|
|
||||||
* Set strategic direction for overhauling the customer experience for Intel graphics on Linux
|
|
||||||
|
|
||||||
# Personal Projects
|
|
||||||
|
|
||||||
1995 - 2023: Photo Management Software
|
|
||||||
* Languages: C, JavaScript, PHP, HTML5, CSS, Polymer, React, SQL
|
|
||||||
* Role: Personal photo management software, including facial recognition
|
|
||||||
* Image classification, clustering, and identity
|
|
||||||
|
|
||||||
2020 - 2025: Eikona Android App
|
|
||||||
* OS: Android
|
|
||||||
* Languages: Java, Expo, React
|
|
||||||
* Role: Maintainer for Android port
|
|
||||||
|
|
||||||
2019 - 2023: Peddlers of Ketran
|
|
||||||
* Languages: JavaScript, React, NodeJS, HTML5, CSS
|
|
||||||
* Features: Audio, Video, and Text chat. Full game plus expansions.
|
|
||||||
* Role: Self-hosted online multiplayer clone of Settlers of Catan
|
|
||||||
|
|
||||||
2025: Ze-Monitor
|
|
||||||
* C++ utility leveraging Level Zero API to monitor GPUs
|
|
||||||
* https://github.com/jketreno/ze-monitor
|
|
2949
src/ketr-chat/package-lock.json
generated
2949
src/ketr-chat/package-lock.json
generated
File diff suppressed because it is too large
Load Diff
@ -19,6 +19,7 @@
|
|||||||
"react": "^19.0.0",
|
"react": "^19.0.0",
|
||||||
"react-dom": "^19.0.0",
|
"react-dom": "^19.0.0",
|
||||||
"react-markdown": "^10.1.0",
|
"react-markdown": "^10.1.0",
|
||||||
|
"react-plotly.js": "^2.6.0",
|
||||||
"react-scripts": "5.0.1",
|
"react-scripts": "5.0.1",
|
||||||
"react-spinners": "^0.15.0",
|
"react-spinners": "^0.15.0",
|
||||||
"rehype-katex": "^7.0.1",
|
"rehype-katex": "^7.0.1",
|
||||||
|
@ -73,7 +73,6 @@ interface ControlsParams {
|
|||||||
systemInfo: SystemInfo,
|
systemInfo: SystemInfo,
|
||||||
toggleTool: (tool: Tool) => void,
|
toggleTool: (tool: Tool) => void,
|
||||||
toggleRag: (tool: Tool) => void,
|
toggleRag: (tool: Tool) => void,
|
||||||
setRags: (rags: Tool[]) => void,
|
|
||||||
setSystemPrompt: (prompt: string) => void,
|
setSystemPrompt: (prompt: string) => void,
|
||||||
reset: (types: ("rags" | "tools" | "history" | "system-prompt")[], message: string) => Promise<void>
|
reset: (types: ("rags" | "tools" | "history" | "system-prompt")[], message: string) => Promise<void>
|
||||||
};
|
};
|
||||||
@ -427,8 +426,26 @@ const App = () => {
|
|||||||
}, [sessionId, rags, setRags, setSnack, loc]);
|
}, [sessionId, rags, setRags, setSnack, loc]);
|
||||||
|
|
||||||
const toggleRag = async (tool: Tool) => {
|
const toggleRag = async (tool: Tool) => {
|
||||||
setSnack("RAG is not yet implemented", "warning");
|
tool.enabled = !tool.enabled
|
||||||
}
|
try {
|
||||||
|
const response = await fetch(getConnectionBase(loc) + `/api/rags/${sessionId}`, {
|
||||||
|
method: 'PUT',
|
||||||
|
headers: {
|
||||||
|
'Content-Type': 'application/json',
|
||||||
|
'Accept': 'application/json',
|
||||||
|
},
|
||||||
|
body: JSON.stringify({ "tool": tool?.name, "enabled": tool.enabled }),
|
||||||
|
});
|
||||||
|
|
||||||
|
const rags = await response.json();
|
||||||
|
setRags([...rags])
|
||||||
|
setSnack(`${tool?.name} ${tool.enabled ? "enabled" : "disabled"}`);
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Fetch error:', error);
|
||||||
|
setSnack(`${tool?.name} ${tool.enabled ? "enabling" : "disabling"} failed.`, "error");
|
||||||
|
tool.enabled = !tool.enabled
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
const toggleTool = async (tool: Tool) => {
|
const toggleTool = async (tool: Tool) => {
|
||||||
tool.enabled = !tool.enabled
|
tool.enabled = !tool.enabled
|
||||||
@ -544,7 +561,7 @@ const App = () => {
|
|||||||
|
|
||||||
const drawer = (
|
const drawer = (
|
||||||
<>
|
<>
|
||||||
{sessionId !== undefined && systemInfo !== undefined && <Controls {...{ tools, rags, reset, systemPrompt, toggleTool, toggleRag, setRags, setSystemPrompt, systemInfo }} />}
|
{sessionId !== undefined && systemInfo !== undefined && <Controls {...{ tools, rags, reset, systemPrompt, toggleTool, toggleRag, setSystemPrompt, systemInfo }} />}
|
||||||
</>
|
</>
|
||||||
);
|
);
|
||||||
|
|
||||||
@ -787,17 +804,8 @@ const App = () => {
|
|||||||
{drawer}
|
{drawer}
|
||||||
</Drawer>
|
</Drawer>
|
||||||
</Box>
|
</Box>
|
||||||
<Box
|
<Box component="main" sx={{ flexGrow: 1, overflow: 'auto' }} className="ChatBox">
|
||||||
component="main"
|
<Box className="Conversation" sx={{ flexGrow: 2, p: 1 }} ref={conversationRef}>
|
||||||
sx={{
|
|
||||||
flexGrow: 1,
|
|
||||||
overflow: 'auto'
|
|
||||||
}}
|
|
||||||
className="ChatBox">
|
|
||||||
|
|
||||||
<Box className="Conversation"
|
|
||||||
sx={{ flexGrow: 2, p: 1 }}
|
|
||||||
ref={conversationRef}>
|
|
||||||
{conversation.map((message, index) => {
|
{conversation.map((message, index) => {
|
||||||
const formattedContent = message.content.trim();
|
const formattedContent = message.content.trim();
|
||||||
|
|
||||||
@ -844,26 +852,26 @@ const App = () => {
|
|||||||
/>
|
/>
|
||||||
</div>
|
</div>
|
||||||
</Box>
|
</Box>
|
||||||
</Box>
|
|
||||||
|
|
||||||
<Box className="Query" sx={{ display: "flex", flexDirection: "row", p: 1 }}>
|
<Box className="Query" sx={{ display: "flex", flexDirection: "row", p: 1 }}>
|
||||||
<TextField
|
<TextField
|
||||||
variant="outlined"
|
variant="outlined"
|
||||||
disabled={processing}
|
disabled={processing}
|
||||||
autoFocus
|
autoFocus
|
||||||
fullWidth
|
fullWidth
|
||||||
type="text"
|
type="text"
|
||||||
value={query}
|
value={query}
|
||||||
onChange={(e) => setQuery(e.target.value)}
|
onChange={(e) => setQuery(e.target.value)}
|
||||||
onKeyDown={handleKeyPress}
|
onKeyDown={handleKeyPress}
|
||||||
placeholder="Enter your question..."
|
placeholder="Enter your question..."
|
||||||
id="QueryInput"
|
id="QueryInput"
|
||||||
/>
|
/>
|
||||||
<AccordionActions>
|
<AccordionActions>
|
||||||
<Tooltip title="Send">
|
<Tooltip title="Send">
|
||||||
<Button sx={{ m: 0 }} variant="contained" onClick={sendQuery}><SendIcon /></Button>
|
<Button sx={{ m: 0 }} variant="contained" onClick={sendQuery}><SendIcon /></Button>
|
||||||
</Tooltip>
|
</Tooltip>
|
||||||
</AccordionActions>
|
</AccordionActions>
|
||||||
|
</Box>
|
||||||
</Box>
|
</Box>
|
||||||
</Box>
|
</Box>
|
||||||
|
|
||||||
|
110
src/server.py
110
src/server.py
@ -51,9 +51,11 @@ from bs4 import BeautifulSoup
|
|||||||
from fastapi import FastAPI, HTTPException, BackgroundTasks, Request
|
from fastapi import FastAPI, HTTPException, BackgroundTasks, Request
|
||||||
from fastapi.responses import JSONResponse, StreamingResponse, FileResponse, RedirectResponse
|
from fastapi.responses import JSONResponse, StreamingResponse, FileResponse, RedirectResponse
|
||||||
from fastapi.middleware.cors import CORSMiddleware
|
from fastapi.middleware.cors import CORSMiddleware
|
||||||
from utils import rag
|
|
||||||
|
from utils import rag as Rag
|
||||||
|
|
||||||
from tools import (
|
from tools import (
|
||||||
|
get_tool_alias,
|
||||||
get_weather_by_location,
|
get_weather_by_location,
|
||||||
get_current_datetime,
|
get_current_datetime,
|
||||||
get_ticker_price,
|
get_ticker_price,
|
||||||
@ -61,11 +63,10 @@ from tools import (
|
|||||||
)
|
)
|
||||||
|
|
||||||
rags = [
|
rags = [
|
||||||
{ "name": "JPK", "enabled": False, "description": "Expert data about James Ketrenos, including work history, personal hobbies, and projects." },
|
{ "name": "JPK", "enabled": True, "description": "Expert data about James Ketrenos, including work history, personal hobbies, and projects." },
|
||||||
{ "name": "LKML", "enabled": False, "description": "Full associative data for entire LKML mailing list archive." },
|
# { "name": "LKML", "enabled": False, "description": "Full associative data for entire LKML mailing list archive." },
|
||||||
]
|
]
|
||||||
|
|
||||||
|
|
||||||
def get_installed_ram():
|
def get_installed_ram():
|
||||||
try:
|
try:
|
||||||
with open('/proc/meminfo', 'r') as f:
|
with open('/proc/meminfo', 'r') as f:
|
||||||
@ -138,7 +139,7 @@ OLLAMA_API_URL = "http://ollama:11434" # Default Ollama local endpoint
|
|||||||
#MODEL_NAME = "deepseek-r1:7b"
|
#MODEL_NAME = "deepseek-r1:7b"
|
||||||
#MODEL_NAME = "llama3.2"
|
#MODEL_NAME = "llama3.2"
|
||||||
MODEL_NAME = "qwen2.5:7b"
|
MODEL_NAME = "qwen2.5:7b"
|
||||||
LOG_LEVEL="debug"
|
LOG_LEVEL="info"
|
||||||
USE_TLS=False
|
USE_TLS=False
|
||||||
WEB_HOST="0.0.0.0"
|
WEB_HOST="0.0.0.0"
|
||||||
WEB_PORT=5000
|
WEB_PORT=5000
|
||||||
@ -295,21 +296,21 @@ async def handle_tool_calls(message):
|
|||||||
ret = None
|
ret = None
|
||||||
else:
|
else:
|
||||||
ret = get_ticker_price(ticker)
|
ret = get_ticker_price(ticker)
|
||||||
tools_used.append(f"{tool}({ticker})")
|
tools_used.append(f"{get_tool_alias(tool)}({ticker})")
|
||||||
case 'summarize_site':
|
case 'summarize_site':
|
||||||
url = arguments.get('url');
|
url = arguments.get('url');
|
||||||
question = arguments.get('question', 'what is the summary of this content?')
|
question = arguments.get('question', 'what is the summary of this content?')
|
||||||
ret = await summarize_site(url, question)
|
ret = await summarize_site(url, question)
|
||||||
tools_used.append(f"{tool}('{url}', '{question}')")
|
tools_used.append(f"{get_tool_alias(tool)}('{url}', '{question}')")
|
||||||
case 'get_current_datetime':
|
case 'get_current_datetime':
|
||||||
tz = arguments.get('timezone')
|
tz = arguments.get('timezone')
|
||||||
ret = get_current_datetime(tz)
|
ret = get_current_datetime(tz)
|
||||||
tools_used.append(f"{tool}('{tz}')")
|
tools_used.append(f"{get_tool_alias(tool)}('{tz}')")
|
||||||
case 'get_weather_by_location':
|
case 'get_weather_by_location':
|
||||||
city = arguments.get('city')
|
city = arguments.get('city')
|
||||||
state = arguments.get('state')
|
state = arguments.get('state')
|
||||||
ret = get_weather_by_location(city, state)
|
ret = get_weather_by_location(city, state)
|
||||||
tools_used.append(f"{tool}('{city}', '{state}')")
|
tools_used.append(f"{get_tool_alias(tool)}('{city}', '{state}')")
|
||||||
case _:
|
case _:
|
||||||
ret = None
|
ret = None
|
||||||
response.append({
|
response.append({
|
||||||
@ -411,13 +412,14 @@ def llm_tools(tools):
|
|||||||
|
|
||||||
# %%
|
# %%
|
||||||
class WebServer:
|
class WebServer:
|
||||||
def __init__(self, logging, client, model=MODEL_NAME):
|
def __init__(self, logging, client, collection, model=MODEL_NAME):
|
||||||
self.logging = logging
|
self.logging = logging
|
||||||
self.app = FastAPI()
|
self.app = FastAPI()
|
||||||
self.contexts = {}
|
self.contexts = {}
|
||||||
self.client = client
|
self.client = client
|
||||||
self.model = model
|
self.model = model
|
||||||
self.processing = False
|
self.processing = False
|
||||||
|
self.collection = collection
|
||||||
|
|
||||||
self.app.add_middleware(
|
self.app.add_middleware(
|
||||||
CORSMiddleware,
|
CORSMiddleware,
|
||||||
@ -451,9 +453,9 @@ class WebServer:
|
|||||||
case "system-prompt":
|
case "system-prompt":
|
||||||
context["system"] = [{"role": "system", "content": system_message}]
|
context["system"] = [{"role": "system", "content": system_message}]
|
||||||
response["system-prompt"] = { "system-prompt": system_message }
|
response["system-prompt"] = { "system-prompt": system_message }
|
||||||
case "rag":
|
case "rags":
|
||||||
context["rag"] = rags.copy()
|
context["rags"] = rags.copy()
|
||||||
response["rags"] = context["rag"]
|
response["rags"] = context["rags"]
|
||||||
case "tools":
|
case "tools":
|
||||||
context["tools"] = default_tools(tools)
|
context["tools"] = default_tools(tools)
|
||||||
response["tools"] = context["tools"]
|
response["tools"] = context["tools"]
|
||||||
@ -461,14 +463,13 @@ class WebServer:
|
|||||||
context["history"] = []
|
context["history"] = []
|
||||||
response["history"] = context["history"]
|
response["history"] = context["history"]
|
||||||
if not response:
|
if not response:
|
||||||
return JSONResponse({ "error": "Usage: { reset: rag|tools|history|system-prompt}"})
|
return JSONResponse({ "error": "Usage: { reset: rags|tools|history|system-prompt}"})
|
||||||
else:
|
else:
|
||||||
self.save_context(context_id)
|
self.save_context(context_id)
|
||||||
return JSONResponse(response)
|
return JSONResponse(response)
|
||||||
|
|
||||||
except:
|
except:
|
||||||
return JSONResponse({ "error": "Usage: { reset: rag|tools|history|system-prompt}"})
|
return JSONResponse({ "error": "Usage: { reset: rags|tools|history|system-prompt}"})
|
||||||
|
|
||||||
|
|
||||||
@self.app.put('/api/system-prompt/{context_id}')
|
@self.app.put('/api/system-prompt/{context_id}')
|
||||||
async def put_system_prompt(context_id: str, request: Request):
|
async def put_system_prompt(context_id: str, request: Request):
|
||||||
@ -529,7 +530,7 @@ class WebServer:
|
|||||||
@self.app.get('/api/history/{context_id}')
|
@self.app.get('/api/history/{context_id}')
|
||||||
async def get_history(context_id: str):
|
async def get_history(context_id: str):
|
||||||
context = self.upsert_context(context_id)
|
context = self.upsert_context(context_id)
|
||||||
return JSONResponse(context["history"])
|
return JSONResponse(context["ragless_history"])
|
||||||
|
|
||||||
@self.app.get('/api/tools/{context_id}')
|
@self.app.get('/api/tools/{context_id}')
|
||||||
async def get_tools(context_id: str):
|
async def get_tools(context_id: str):
|
||||||
@ -560,6 +561,26 @@ class WebServer:
|
|||||||
context = self.upsert_context(context_id)
|
context = self.upsert_context(context_id)
|
||||||
return JSONResponse(context["rags"])
|
return JSONResponse(context["rags"])
|
||||||
|
|
||||||
|
@self.app.put('/api/rags/{context_id}')
|
||||||
|
async def put_rags(context_id: str, request: Request):
|
||||||
|
if not is_valid_uuid(context_id):
|
||||||
|
logging.warning(f"Invalid context_id: {context_id}")
|
||||||
|
return JSONResponse({"error": "Invalid context_id"}, status_code=400)
|
||||||
|
context = self.upsert_context(context_id)
|
||||||
|
try:
|
||||||
|
data = await request.json()
|
||||||
|
modify = data["tool"]
|
||||||
|
enabled = data["enabled"]
|
||||||
|
for tool in context["rags"]:
|
||||||
|
if modify == tool["name"]:
|
||||||
|
tool["enabled"] = enabled
|
||||||
|
self.save_context(context_id)
|
||||||
|
return JSONResponse(context["rags"])
|
||||||
|
return JSONResponse({ "status": f"{modify} not found in tools." }), 404
|
||||||
|
except:
|
||||||
|
return JSONResponse({ "status": "error" }), 405
|
||||||
|
|
||||||
|
|
||||||
@self.app.get('/api/health')
|
@self.app.get('/api/health')
|
||||||
async def health_check():
|
async def health_check():
|
||||||
return JSONResponse({"status": "healthy"})
|
return JSONResponse({"status": "healthy"})
|
||||||
@ -621,7 +642,6 @@ class WebServer:
|
|||||||
|
|
||||||
return self.contexts[session_id]
|
return self.contexts[session_id]
|
||||||
|
|
||||||
|
|
||||||
def create_context(self, context_id = None):
|
def create_context(self, context_id = None):
|
||||||
if not context_id:
|
if not context_id:
|
||||||
context_id = str(uuid.uuid4())
|
context_id = str(uuid.uuid4())
|
||||||
@ -629,6 +649,7 @@ class WebServer:
|
|||||||
"id": context_id,
|
"id": context_id,
|
||||||
"system": [{"role": "system", "content": system_message}],
|
"system": [{"role": "system", "content": system_message}],
|
||||||
"history": [],
|
"history": [],
|
||||||
|
"ragless_history": [],
|
||||||
"tools": default_tools(tools),
|
"tools": default_tools(tools),
|
||||||
"rags": rags.copy()
|
"rags": rags.copy()
|
||||||
}
|
}
|
||||||
@ -658,20 +679,40 @@ class WebServer:
|
|||||||
|
|
||||||
self.processing = True
|
self.processing = True
|
||||||
|
|
||||||
|
history = context["history"]
|
||||||
|
ragless_history = context["ragless_history"]
|
||||||
|
|
||||||
|
rag_used = []
|
||||||
|
rag_docs = []
|
||||||
|
for rag in context["rags"]:
|
||||||
|
if rag["enabled"] and rag["name"] == "JPK": # Only support JPK rag right now...
|
||||||
|
yield {"status": "processing", "message": f"Checking RAG context {rag['name']}..."}
|
||||||
|
matches = Rag.find_similar(llm=self.client, collection=self.collection, query=content, top_k=10)
|
||||||
|
if len(matches):
|
||||||
|
rag_used.append(rag['name'])
|
||||||
|
rag_docs.extend(matches)
|
||||||
|
|
||||||
|
preamble = ""
|
||||||
|
if len(rag_docs):
|
||||||
|
preamble = "Context:\n"
|
||||||
|
for doc in rag_docs:
|
||||||
|
preamble += doc
|
||||||
|
preamble += "\nHuman: "
|
||||||
|
|
||||||
|
# Figure
|
||||||
|
history.append({"role": "user", "content": preamble + content})
|
||||||
|
ragless_history.append({"role": "user", "content": content})
|
||||||
|
|
||||||
|
messages = context["system"] + history[-1:]
|
||||||
|
|
||||||
try:
|
try:
|
||||||
history = context["history"]
|
|
||||||
history.append({"role": "user", "content": content})
|
|
||||||
|
|
||||||
messages = context["system"] + history[-1:]
|
|
||||||
#logging.info(messages)
|
|
||||||
|
|
||||||
yield {"status": "processing", "message": "Processing request..."}
|
yield {"status": "processing", "message": "Processing request..."}
|
||||||
|
|
||||||
# Use the async generator in an async for loop
|
# Use the async generator in an async for loop
|
||||||
response = self.client.chat(model=self.model, messages=messages, tools=llm_tools(context["tools"]))
|
response = self.client.chat(model=self.model, messages=messages, tools=llm_tools(context["tools"]))
|
||||||
tools_used = []
|
tools_used = []
|
||||||
|
|
||||||
yield {"status": "processing", "message": "Initial response received"}
|
yield {"status": "processing", "message": "Initial response received..."}
|
||||||
|
|
||||||
if 'tool_calls' in response.get('message', {}):
|
if 'tool_calls' in response.get('message', {}):
|
||||||
yield {"status": "processing", "message": "Processing tool calls..."}
|
yield {"status": "processing", "message": "Processing tool calls..."}
|
||||||
@ -704,7 +745,15 @@ class WebServer:
|
|||||||
final_message = {"role": "assistant", "content": reply, 'metadata': {"title": f"🛠️ Tool(s) used: {','.join(tools_used)}"}}
|
final_message = {"role": "assistant", "content": reply, 'metadata': {"title": f"🛠️ Tool(s) used: {','.join(tools_used)}"}}
|
||||||
else:
|
else:
|
||||||
final_message = {"role": "assistant", "content": reply}
|
final_message = {"role": "assistant", "content": reply}
|
||||||
|
if len(rag_used):
|
||||||
|
if "metadata" in final_message:
|
||||||
|
final_message["metadata"]["title"] += f"🔍 RAG(s) used: {','.join(rag_used)}"
|
||||||
|
else:
|
||||||
|
final_message["metadata"] = { "title": f"🔍 RAG(s) used: {','.join(rag_used)}" }
|
||||||
|
|
||||||
history.append(final_message)
|
history.append(final_message)
|
||||||
|
ragless_history.append(final_message)
|
||||||
|
|
||||||
yield {"status": "done", "message": final_message}
|
yield {"status": "done", "message": final_message}
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
@ -732,7 +781,16 @@ def main():
|
|||||||
client = ollama.Client(host=args.ollama_server)
|
client = ollama.Client(host=args.ollama_server)
|
||||||
model = args.ollama_model
|
model = args.ollama_model
|
||||||
|
|
||||||
web_server = WebServer(logging, client, model)
|
documents = Rag.load_text_files("doc")
|
||||||
|
print(f"Documents loaded {len(documents)}")
|
||||||
|
collection = Rag.get_vector_collection()
|
||||||
|
chunks = Rag.create_chunks_from_documents(documents)
|
||||||
|
Rag.add_embeddings_to_collection(client, collection, chunks)
|
||||||
|
doc_types = set(chunk.metadata['doc_type'] for chunk in chunks)
|
||||||
|
print(f"Document types: {doc_types}")
|
||||||
|
print(f"Vectorstore created with {collection.count()} documents")
|
||||||
|
|
||||||
|
web_server = WebServer(logging, client, collection, model)
|
||||||
logging.info(f"Starting web server at http://{args.web_host}:{args.web_port}")
|
logging.info(f"Starting web server at http://{args.web_host}:{args.web_port}")
|
||||||
web_server.run(host=args.web_host, port=args.web_port, use_reloader=False)
|
web_server.run(host=args.web_host, port=args.web_port, use_reloader=False)
|
||||||
|
|
||||||
|
108
src/utils/rag.py
108
src/utils/rag.py
@ -1 +1,107 @@
|
|||||||
rag = "exists"
|
__all__ = [
|
||||||
|
'load_text_files',
|
||||||
|
'create_chunks_from_documents',
|
||||||
|
'get_vector_collection',
|
||||||
|
'add_embeddings_to_collection',
|
||||||
|
'find_similar'
|
||||||
|
]
|
||||||
|
|
||||||
|
import os
|
||||||
|
import glob
|
||||||
|
|
||||||
|
import chromadb
|
||||||
|
import ollama
|
||||||
|
from langchain.text_splitter import CharacterTextSplitter
|
||||||
|
from langchain.schema import Document # Import the Document class
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
# When running directly, use absolute imports
|
||||||
|
import defines
|
||||||
|
else:
|
||||||
|
# When imported as a module, use relative imports
|
||||||
|
from . import defines
|
||||||
|
|
||||||
|
def load_text_files(directory, encoding="utf-8"):
|
||||||
|
file_paths = glob.glob(os.path.join(directory, "**/*"), recursive=True)
|
||||||
|
documents = []
|
||||||
|
|
||||||
|
for file_path in file_paths:
|
||||||
|
if os.path.isfile(file_path): # Ensure it's a file, not a directory
|
||||||
|
try:
|
||||||
|
with open(file_path, "r", encoding=encoding) as f:
|
||||||
|
content = f.read()
|
||||||
|
|
||||||
|
# Extract top-level directory
|
||||||
|
rel_path = os.path.relpath(file_path, directory) # Relative to base directory
|
||||||
|
top_level_dir = rel_path.split(os.sep)[0] # Get the first directory in the path
|
||||||
|
|
||||||
|
documents.append(Document(
|
||||||
|
page_content=content, # Required format for LangChain
|
||||||
|
metadata={"doc_type": top_level_dir, "path": file_path}
|
||||||
|
))
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Failed to load {file_path}: {e}")
|
||||||
|
|
||||||
|
return documents
|
||||||
|
|
||||||
|
def get_vector_collection(path=defines.persist_directory, name="documents"):
|
||||||
|
# Initialize ChromaDB client
|
||||||
|
chroma_client = chromadb.PersistentClient(path=path, settings=chromadb.Settings(anonymized_telemetry=False))
|
||||||
|
|
||||||
|
# Check if the collection exists and delete it
|
||||||
|
if os.path.exists(path):
|
||||||
|
try:
|
||||||
|
chroma_client.delete_collection(name=name)
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Failed to delete existing collection: {e}")
|
||||||
|
|
||||||
|
return chroma_client.get_or_create_collection(name=name)
|
||||||
|
|
||||||
|
# Function to generate embeddings using Ollama
|
||||||
|
def get_embedding(llm, text):
|
||||||
|
response = llm.embeddings(model=defines.model, prompt=text)
|
||||||
|
return response["embedding"]
|
||||||
|
|
||||||
|
def add_embeddings_to_collection(llm, collection, chunks):
|
||||||
|
# Store documents in ChromaDB
|
||||||
|
for i, text_or_doc in enumerate(chunks):
|
||||||
|
# If input is a Document, extract the text content
|
||||||
|
if isinstance(text_or_doc, Document):
|
||||||
|
text = text_or_doc.page_content
|
||||||
|
else:
|
||||||
|
text = text_or_doc # Assume it's already a string
|
||||||
|
|
||||||
|
embedding = get_embedding(llm, text)
|
||||||
|
collection.add(
|
||||||
|
ids=[str(i)],
|
||||||
|
documents=[text],
|
||||||
|
embeddings=[embedding]
|
||||||
|
)
|
||||||
|
|
||||||
|
def find_similar(llm, collection, query, top_k=3):
|
||||||
|
query_embedding = get_embedding(llm, query)
|
||||||
|
results = collection.query(
|
||||||
|
query_embeddings=[query_embedding],
|
||||||
|
n_results=top_k
|
||||||
|
)
|
||||||
|
return results["documents"][0] # List of top_k matching documents
|
||||||
|
|
||||||
|
def create_chunks_from_documents(docs):
|
||||||
|
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
|
||||||
|
return text_splitter.split_documents(docs)
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
# When running directly, use absolute imports
|
||||||
|
import defines
|
||||||
|
llm = ollama.Client(host=defines.ollama_api_url)
|
||||||
|
documents = load_text_files("doc")
|
||||||
|
print(f"Documents loaded {len(documents)}")
|
||||||
|
collection = get_vector_collection()
|
||||||
|
chunks = create_chunks_from_documents(documents)
|
||||||
|
add_embeddings_to_collection(llm, collection, chunks)
|
||||||
|
doc_types = set(chunk.metadata['doc_type'] for chunk in chunks)
|
||||||
|
print(f"Document types: {doc_types}")
|
||||||
|
print(f"Vectorstore created with {collection.count()} documents")
|
||||||
|
query = "Can you describe James Ketrenos' work history?"
|
||||||
|
top_docs = find_similar(llm, query, top_k=3)
|
||||||
|
print(top_docs)
|
||||||
|
Loading…
x
Reference in New Issue
Block a user