Switching to async handle_tool_calls

2025-04-01 18:07:11 -07:00 · 2025-04-01 18:07:11 -07:00 · 2e5bc651fa
commit 2e5bc651fa
parent 100e8ea9db
9 changed files with 3183 additions and 633 deletions
--- a/doc/projects/airc.txt
+++ b/doc/projects/airc.txt
@ -1,105 +0,0 @@
 # AIRC (pronounced Eric)
 AI is Really Cool
 This project provides a simple IRC chat client. It runs the neuralchat model, enhanced with a little bit of RAG to fetch news RSS feeds.
 Internally, it is built using PyTorch 2.6 and the Intel IPEX/LLM.
 NOTE: If running on an Intel Arc A series graphics processor, fp64 is not supported and may need to either be emulated or have the model quantized. It has been a while since I've had an A series GPU to test on, so if you run into problems please file an [issue](https://github.com/jketreno/airc/issues)--I have some routines I can put in, but don't have a way to test them. 
 # Installation
 This project uses docker containers to build. As this was originally written to work on an Intel Arc B580 (Battlemage), it requires a kernel that supports that hardware, such as the one documented at [Intel Graphics Preview](https://github.com/canonical/intel-graphics-preview), which runs in Ubuntu Oracular (24.10)..
 NOTE: You need 'docker compose' installed. See [Install Docker Engine on Ubuntu](https://docs.docker.com/engine/install/ubuntu/)
 ## Want to run under WSL2? No can do...
 https://www.intel.com/content/www/us/en/support/articles/000093216/graphics/processor-graphics.html
 The A- and B-series discrete GPUs do not support SR-IOV, required for the GPU partitioning that Microsoft Windows uses in order to support GPU acceleration in WSL.
 ## Building
 NOTE: You need 'docker compose' installed. See [Install Docker Engine on Ubuntu](https://docs.docker.com/engine/install/ubuntu/)
 ```bash
 git clone https://github.com/jketreno/airc
 cd airc
 docker compose build
 ```
 ## Running
 In order to download the models, you need to have a Hugging Face token. See https://huggingface.co/settings/tokens for information on obtaining a token.
 Edit .env to add the following:
 ```.env
 HF_ACCESS_TOKEN=<access token from huggingface>
 ```
 NOTE: Models downloaded by most examples will be placed in the ./cache directory, which is bind mounted to the container.
 ### AIRC
 To launch the airc shell interactively, with the pytorch 2.6 environment loaded, use the default entrypoint to launch a shell:
 ```bash
 docker compose run --rm airc shell
 ```
 Once in the shell, you can then launch the model-server.py and then the airc.py client:
 ```bash
 docker compose run --rm airc shell
 src/airc.py --ai-server=http://localhost:5000 &
 src/model-server.py
 ```
 By default, src/airc.py will connect to irc.libera.chat on the airc-test channel. See `python src/airc.py --help` for options.
 By separating the model-server into its own process, you can develop and tweak the chat backend without losing the IRC connection established by airc.
 ### Jupyter
 ```bash
 docker compose up jupyter -d
 ```
 The default port for inbound connections is 8888 (see docker-compose.yml). $(pwd)/jupyter is bind mounted to /opt/juypter in the container, which is where notebooks will be saved by default.
 To access the jupyter notebook, go to `https://localhost:8888/jupyter`.
 ### Monitoring
 You can run `ze-monitor` within the launched containers to monitor GPU usage.
 ```bash
 containers=($(docker ps --filter "ancestor=airc" --format "{{.ID}}"))
 if [[ ${#containers[*]} -eq 0 ]]; then
  echo "Running airc container not found."
 else
  for container in ${containers[@]}; do
    echo "Container ${container} devices:"
    docker exec -it ${container} ze-monitor
  done
 fi
 ```
 If an airc container is running, you should see something like:
 ```
 Container 5317c503e771 devices:
 Device 1: 8086:A780 (Intel(R) UHD Graphics 770)
 Device 2: 8086:E20B (Intel(R) Graphics [0xe20b])
 ```
 You can then launch ze-monitor in that container specifying  the device you wish to monitor:
 ```
 containers=($(docker ps --filter "ancestor=airc" --format "{{.ID}}"))
 docker exec -it ${containers[0]} ze-monitor --device 2
 ```
--- a/doc/projects/ze-monitor.txt
+++ b/doc/projects/ze-monitor.txt
@ -1,279 +0,0 @@
 # ze-monitor
 A small utility to monitor Level Zero devices via 
 [Level Zero Sysman](https://oneapi-src.github.io/level-zero-spec/level-zero/latest/sysman/PROG.html#sysman-programming-guide) 
 from the command line, similar to 'top'.
 # Installation
 Requires Ubuntu Oracular 24.10.
 ## Easiest
 ### Install prerequisites
 This will add the [Intel Graphics Preview PPA](https://github.com/canonical/intel-graphics-preview) and install the required dependencies:
 ```bash
 sudo apt-get install -y \
    software-properties-common \
    && sudo add-apt-repository -y ppa:kobuk-team/intel-graphics \
    && sudo apt-get update \
    && sudo apt-get install -y \
    libze1 libze-intel-gpu1 libncurses6
 ```
 ### Install ze-monitor from .deb package
 This will download the ze-monitor GitHub, install it, and add the current
 user to the 'ze-monitor' group to allow running the utility:
 ```bash
 version=0.3.0-1
 wget https://github.com/jketreno/ze-monitor/releases/download/v${version}/ze-monitor-${version}_amd64.deb
 sudo dpkg -i ze-monitor-${version}_amd64.deb
 sudo usermod -a -G ze-monitor $(whoami)
 newgrp ze-monitor
 ```
 Congratulations! You can run ze-monitor:
 ```bash
 ze-monitor
 ```
 You should see something like:
 ```bash
 Device 1: 8086:A780 (Intel(R) UHD Graphics 770)
 Device 2: 8086:E20B (Intel(R) Graphics [0xe20b])
 ```
 To monitor a device:
 ```bash
 ze-monitor --device 2
 ```
 Check the docs (`man ze-monitor`) for additional details on running the ze-monitor utility.
 ## Slightly more involved
 This project uses docker containers to build. As this was originally written to monitor an Intel Arc B580 (Battlemage), it requires a kernel that supports that hardware, such as the one documented at [Intel Graphics Preview](https://github.com/canonical/intel-graphics-preview), which runs in Ubuntu Oracular (24.10). It will monitor any Level Zero device, even those using the i915 driver.
 NOTE: You need 'docker compose' installed. See [Install Docker Engine on Ubuntu](https://docs.docker.com/engine/install/ubuntu/)
 ```
 git clone https://github.com/jketreno/ze-monitor.git
 cd ze-monitor
 docker compose build
 sudo apt install libze1 libncurses6
 version=$(cat src/version.txt)
 docker compose run --remove-orphans --rm \
  ze-monitor \
  cp /opt/ze-monitor-static/build/ze-monitor-${version}_amd64.deb \
  /opt/ze-monitor/build
 sudo dpkg -i build/ze-monitor-${version}_amd64.deb
 ```
 # Security
 In order for ze-monitor to read the performance metric units (PMU) in the  Linux kernel, it needs elevated permissions. The easiest way is to install the .deb package and add the user to the ze-monitor group. Or, run under sudo (eg., `sudo ze-monitor ...`.)
 The specific capabilities required to monitor the GPU are documented in [Perf Security](https://www.kernel.org/doc/html/v5.1/admin-guide/perf-security.html) and [man capabilities](https://man7.org/linux/man-pages/man7/capabilities.7.html). These include:
 | Capability          | Reason                                               |
 |:--------------------|:-----------------------------------------------------|
 | CAP_DAC_READ_SEARCH | Bypass all filesystem read access checks             |
 | CAP_PERFMON         | Access to perf_events (vs. overloaded CAP_SYS_ADMIN) |
 | CAP_SYS_PTRACE      | PTRACE_MODE_READ_REALCREDS ptrace access mode check  |
 To configure ze-monitor to run with those privileges, you can use `setcap` to set the correct capabilities on ze-monitor. You can further secure your system by creating a user group specifically for running the utility and restrict  running of that command to users in that group. That is what the .deb package does.
 If you install the .deb package from a [Release](https://github.com/jketreno/ze-monitor/releases) or by building it, that package will set the appropriate permissions for ze-monitor on installation and set it executable only to those in the 'ze-monitor' group.
 ## Anyone can run ze-monitor
 If you build from source and want to set the capabilities:
 ```bash
 sudo setcap "cap_perfmon,cap_dac_read_search,cap_sys_ptrace=ep" build/ze-monitor
 getcap build/ze-monitor
 ```
 Any user can then run `build/ze-monitor` and monitor the GPU.
 # Build outside container
 ## Prerequisites
 If you would like to build outside of docker, you need the following packages installed:
 ```
 sudo apt-get install -y \
    build-essential \
    libfmt-dev \
    libncurses-dev
 ```
 In addition, you need the Intel drivers installed, which are available from the `kobuk-team/intel-graphics` PPA:
 ```
 sudo apt-get install -y \
    software-properties-common \
    && sudo add-apt-repository -y ppa:kobuk-team/intel-graphics \
    && sudo apt-get update \
    && sudo apt-get install -y \
    libze-intel-gpu1 \
    libze1 \
    libze-dev
 ```
 ## Building
 ```
 cd build
 cmake ..
 make
 ```
 ## Running
 ```
 build/ze-monitor
 ```
 ## Build and install .deb
 In order to build the .deb package, you need the following packages installed:
 ```bash
 sudo apt-get install -y \
    debhelper \
    devscripts \
    rpm \
    rpm2cpio
 ```
 You can then build the .deb:
 ```bash
 if [ -d build ]; then
  cd build
 fi
 version=$(cat ../src/version.txt)
 cpack
 sudo dpkg -i build/packages/ze-monitor_${version}_amd64.deb
 ```
 You can then run ze-monitor from your path:
 ```bash
 ze-monitor
 ```
 # Developing
 To run the built binary without building a full .deb package, you can build and run on the host by compiling in the container:
 ```
 docker compose run --rm ze-monitor build.sh
 build/ze-monitor
 ```
 The build.sh script will build the binary in /opt/ze-monitor/build, which is volume mounted to the host's build directory.
 NOTE: See [Security](#security) for information on running ze-monitor with required kernel access capabilities.
 # Running
 NOTE: See [Security](#security) for information on running ze-monitor with required kernel access capabilities.
 If running within a docker container, the container environment does not have access to the host's `/proc/fd`, which is necessary to obtain information about the processes outside the current container which are using the GPU. As such, only processes running within that container running ze-monitor will be listed as using the GPU.
 ## List available devices
 ```
 ze-monitor
 ```
 Example output:
 ```bash
 $ ze-monitor 
 Device 1: 8086:E20B (Intel(R) Graphics [0xe20b])
 Device 2: 8086:A780 (Intel(R) UHD Graphics 770)
 ```
 ## Show details for a given device
 ```
 sudo ze-monitor --info --device ( PCIID | # | BDF | UUID | /dev/dri/render*)
 ```
 Example output:
 ```bash
 $ sudo ze-monitor --device 2 --info
 Device: 8086:A780 (Intel(R) UHD Graphics 770)
 UUID: 868080A7-0400-0000-0002-000000000000
 BDF: 0000:0000:0002:0000
 PCI ID: 8086:A780
 Subdevices: 0
 Serial Number: unknown
 Board Number: unknown
 Brand Name: unknown
 Model Name: Intel(R) UHD Graphics 770
 Vendor Name: Intel(R) Corporation
 Driver Version: 0CB7EFCAD5695B7EC5C8CE6
 Type: GPU
 Is integrated with host: Yes
 Is a sub-device: No
 Supports error correcting memory: No
 Supports on-demand page-faulting: No
 Engines: 7
  Engine 1: ZES_ENGINE_GROUP_RENDER_SINGLE
  Engine 2: ZES_ENGINE_GROUP_MEDIA_DECODE_SINGLE
  Engine 3: ZES_ENGINE_GROUP_MEDIA_DECODE_SINGLE
  Engine 4: ZES_ENGINE_GROUP_MEDIA_ENCODE_SINGLE
  Engine 5: ZES_ENGINE_GROUP_MEDIA_ENCODE_SINGLE
  Engine 6: ZES_ENGINE_GROUP_COPY_SINGLE
  Engine 7: ZES_ENGINE_GROUP_MEDIA_ENHANCEMENT_SINGLE
 Temperature Sensors: 0
 ```
 NOTE: See [Security](#security) for information on running ze-monitor with required kernel access capabilities.
 ## Monitor a given device
 ```
 sudo ze-monitor --device ( PCIID | # | BDF | UUID | /dev/dri/render* ) \
  --interval ms
 ```
 NOTE: See [Security](#security) for information on running ze-monitor with required kernel access capabilities.
 Output:
 ```bash
 $ sudo ze-monitor --device 2 --interval 500
 Device: 8086:E20B (Intel(R) Graphics [0xe20b])
 Total Memory:  12809404416
 Free memory:  [#  55% ############################                              ]
 Power usage: 165.0W
 ------------------------------------------------------------------------------------------
   PID COMMAND-LINE
       USED MEMORY       SHARED MEMORY     ENGINE FLAGS
 ------------------------------------------------------------------------------------------
     1 /sbin/init splash
       MEM: 106102784    SHR: 100663296    FLAGS: RENDER COMPUTE
  1606 /usr/lib/systemd/systemd-logind
       MEM: 106102784    SHR: 100663296    FLAGS: RENDER COMPUTE
  5164 /usr/bin/gnome-shell
       MEM: 530513920    SHR: 503316480    FLAGS: RENDER COMPUTE
  5237 /usr/bin/Xwayland :1024 -rootless -nores...isplayfd 6 -initfd 7 -byteswappedclients
       MEM: 0            SHR: 0            FLAGS:
 40480 python chat.py
       MEM: 5544226816   SHR: 0            FLAGS: DMA COMPUTE
 ```
 If you pass `--one-shot`, statistics will be gathered, displayed, and then ze-monitor will exit.
--- a/doc/resume/generic.txt
+++ b/doc/resume/generic.txt
@ -1,56 +0,0 @@
 # JAMES KETRENOS
 software architect, designer, developer, and team lead
 Beaverton, OR 97003
 james@ketrenos.com
 (503) 501 8281
 Seeking an opportunity to contribute to the advancement of energy efficient AI solutions, James is a driven problem solver, solution creator, technical leader, and skilled software developer focused on rapid, high-quality results, with an eye toward bringing solutions to the market.
 ## SUMMARY
 Problem-solving: Trusted resource for executive leadership, able to identify opportunities to bridge technical gaps, adopt new technologies, and improve efficiency and quality for internal and external customers.
 Proficient: Adept in compiled and interpreted languages, the software frameworks built around them, and front- and backend infrastructure. Leveraging deep and varied experience to quickly find solutions. Rapidly familiarizes and puts to use new and emerging technologies.
 Experienced: 20+ years of experience as an end-to-end Linux software architect, team lead, developer, system administrator, and user. Working with teams to bring together technologies into existing ecosystems for a myriad of technologies.
 Leader: Frequent project lead spanning all areas of development and phases of the product life cycle from pre-silicon to post launch support. Capable change agent and mentor, providing technical engineering guidance to multiple teams and organizations.
 Communicates: Thrives on helping people solve problems, working to educate others to help them better understand problems and work toward solutions.
 ## RECENT HISTORY
 2024-2025: Present
 * Developed 'ze-monitor', a lightweight C++ Linux application leveraging Level Zero Sysman APIs to provide 'top' like device monitoring of Intel GPUs. https://github.com/jketreno/ze-monitor
 * Developed 'airc', a LLM pipeline allowing interactive queries about James' resume. Utilizing both in-context and fine-tuned approaches, questions asked about James will use information from his resume and portfolio for answers. Includes a full-stack React web ui, a command line client, and an IRC bot integration. https://github.com/jketreno/airc
 2018-2024: Intel® Graphics Software Staff Architect and Lead
 * Redefined how Intel approaches graphics enabling on Linux to meet customer and product timelines.
 * Spearheaded internal projects to prove out the developer and customer deployment experience when using Intel graphics products with PyTorch, working to ensure all ingredients are available and consumable for success (from kernel driver integration, runtime, framework integration, up to containerized Python workload solution deployment.)
 * Focused on improving the customer experience for Intel graphics software for Linux in the data center, high-performance compute clusters, and end users. Worked with several teams and business units to close gaps, improve our software, documentation, and release methodologies.
 * Worked with hardware and firmware teams to scope and define architectural solutions for customer features.
 1998-2018: Open Source Software Architect and Lead
 * Defined software architecture for handheld devices, tablets, Internet of Things, smart appliances, and emerging technologies. Key resource to executive staff to investigate emerging technologies and drive solutions to close existing gaps
 * James career at Intel has been diverse. His strongest skills are related to quickly ramping on technologies being utilized in the market, identifying gaps in existing solutions, and working with teams to close those gaps. He excels at adopting and fitting new technology trends as they materialize in the industry.
 ## PROLONGED HISTORY
 The following are technical areas James has been an architect, team lead, and/or individual contributor:
 * Linux release infrastructure overhaul: Identified bottlenecks in the CI/CD build pipeline, built proof-of-concept, and moved to production for generating releases of Intel graphics software (https://dgpu-docs.intel.com) as well as internal dashboards and infrastructure for tracking build and release pipelines. JavaScript, HTML, Markdown, RTD, bash/python, Linux packaging, Linux repositories, Linux OS release life cycles, sqlite3. Worked with multiple teams across Intel to meet Intel’s requirements for public websites as well as to integrate with existing build and validation methodologies while educating teams on tools and infrastructure available from the ecosystem (vs. roll-your-own).
 * Board Explorer: Web app targeting developer ecosystem to utilize new single board computers, providing quick access to board details, circuits, and programming information. Delivered as a pure front-end service (no backend required) https://board-explorer.github.io/board-explorer/#quark_mcu_dev_kit_d2000. Tight coordination with UX design team. JavaScript, HTML, CSS, XML, hardware specs, programming specs.
 * (internal) Travel Requisition: Internal HTML application and backend enabling internal organizations to request travel approval and a manager front end to track budgetary expenditures in order to determine approval/deny decisions. NodeJS, JavaScript, Polymer, SQL. Tight coordination with internal requirements providers and UX design teams.
 * Developer Journey: Web infrastructure allowing engineers to document DIY processes. Front end for parsing, viewing, and following projects. Back end for managing content submitted (extended markdown) including images, videos, and screencasts. Tight coordination with UX design team.
 * Robotics: Worked with teams aligning on a ROS (Robot OS) roadmap and alignment. Presented at Embedded Linux conference on the state of open source and robotics. LIDAR, Intel RealSense, opencv, python, C. Developed a robotic vision controlled stewart platform that could play the marble game labyrinth.
 * Moblin and MeeGo architect: Focused on overall software architecture as well as moving forward multi-touch and the industry shift to resolution independent applications; all in a time before smart phones as we know them today. Qt, HTML5, EFL.
 * Marblin: An HTML/WebGL graphical application simulating the 2D collision physics of marbles in a 3D rendered canvas.
 * Linux Kernel: Developed and maintained initial Intel Pro Wireless 2100, 2200, and 3945 drivers in the Linux kernel. C, Software Defined Radios, IEEE 802.11, upstream kernel driver, team lead for team that took over the Intel wireless drivers, internal coordination regarding technical and legal issues surrounding the wireless stack.
 * Open source at Intel: Built proof-of-concepts to illustrate to management the potential and opportunities for Intel by embracing open source and Linux.
 * Intel Intercast Technology: Team lead for Intel Intercast software for Windows. Worked with 3rd party companies to integrate the technology into their solutions. 
--- a/doc/resume/timeline.md
+++ b/doc/resume/timeline.md
@ -1,132 +0,0 @@
 # Professional Projects
 ## 1995 - 1998: Intel Intercast Technology
 * OS: Microsoft Windows Application, WinTV
 * Languages: C++
 * Role: Team lead and software architect
 * Microsoft Media infrastructure
 * Windows kernel driver work
 * Worked with internal teams and external companies to expand compatible hardware and integrate with Windows
 * Integration of Internet Explorer via COM embedding into the Intercast Viewer
 ## 1999 - 2024: Linux evangelist
 * One of the initial members of Intel's Open Source Technology Center (OTC)
 * Worked across Intel organizational boundaries to educate teams on the benefits and working model of the Linux open source ecosystem
 * Deep understanding of licensing issues, political dynamics, community goals, and business needs
 * Frequent resource for executive management and teams looking to leverage open source software
 ## 2000 - 2001: COM on Linux Prototype
 * Distributed component object model
 * Languages: C++, STL, Flex, Yacc, Bison
 * Role: Team lead and architect
 * Evaluated key performance differences between Microsoft Component Object Model's (COM) IUnknown (QueryInterface, AddRef, Release) vs. the Component Object Request Broker Architecture (CORBA) for both in-process and distributed cross-process and remote communication.
 * Developed prototype tool-chain and functional code providing a Linux compatible implementation of COM
 ## 1998 - 2000: Intel Dot Station
 * Languages: Java, C
 * Designed and built a "visual lens" Java plugin for Netscape Navigator
 * Role: Software architect
 ## 2000 - 2002: Carrier Grade Linux
 * OS distribution work 
 * Contributed to the Linux System Base specification
 * Role: Team lead and software architect working with internal and external collaborators
 ## 2004 - 2006: Intel Wireless Linux Kernel Driver
 * Languages: C
 * Authored original ipw2100, ipw2200, and ipw3945 Linux kernel drivers
 * Built IEEE 802.11 wireless subsystem
 * Hosted Wireless Birds-of-a-Feather talk at the Ottawa Linux Symposium
 * Maintained SourceForge web presence, IRC channel, and community
 ## 2015 - 2018: Robotics
 * Languages: C, Python, NodeJS
 * "Maker" blogs on developing a Stewart Platform
 * 
 * Image recognition and tracking
 * Presented at Embedded Linux Conference
 ## 2012 - 2017: RT24 - crosswalk
 * Chromium based native web application host
 * Role: Team lead and software architect
 * Worked with WebGL, Web Assembly, Native Client (NaCl)
 * Several internal presentations at various corporate events
 ## 2007 - 2009: Moblin
 * Tablet targetting OS distribution
 * Role: Team lead and software architect and requirements
 * Technology evaluation: Cairo, EFL, GTK, Clutter
 * Languages: C, C++, OpenGL
 ## 2012 - Web Sys Info
 * W3C
 * Tizen Working Group
 ## 2007 - 2017: Marblin
 * An interactive graphical stress test of rendering contexts
 * Ported to each framework being used for OS development
 * Originally written in C and using Clutter, ported to WebGL and EFL
 ## 2009 - 2011: MeeGo
 * The merging of Linux Foundation's Moblin with Nokia's Maemo
 * Coordinated and worked across business groups at Intel and Nokia
 * Role: Team lead and software architect
 * Focused on:
  * Resolution independent user interfaces
  * Multi-touch enabling in X
 * Educated teams on the interface paradigm shift to "mobile first"
 * Presented at MeeGo Conference
 * Languages: C++, QT, HTML5
 ## Android on Intel
 ## 2011 - 2013: Tizen
 * Rendering framework: Enlightenment Foundation Library (EFL)
 * Focused on: API specifications
 * Languages: JavaScript, HTML, C
 ## Robotics
 ## Quark
 ## Board Explorer
 ## Stewart Platform
 ## Developer Journey
 ## Product and Team Tracker
 ## Travel Tool
 ## Drones
 ## Security Mitigations
 ## 2019 - 2024: Intel Graphics Architect
 * Technologies: C, JavaScript, HTML5, React, Markdown, bash, GitHub, GitHub Actions, Docker, Clusters, Data Center, Machine Learning, git
 * Role:
  * Set strategic direction for working with open source ecosystem
  * Worked with hardware and software architects to plan, execute, and support features
  * Set strategic direction for overhauling the customer experience for Intel graphics on Linux
 # Personal Projects
 1995 - 2023: Photo Management Software
 * Languages: C, JavaScript, PHP, HTML5, CSS, Polymer, React, SQL
 * Role: Personal photo management software, including facial recognition
 * Image classification, clustering, and identity
 2020 - 2025: Eikona Android App
 * OS: Android
 * Languages: Java, Expo, React
 * Role: Maintainer for Android port
 2019 - 2023: Peddlers of Ketran
 * Languages: JavaScript, React, NodeJS, HTML5, CSS
 * Features: Audio, Video, and Text chat. Full game plus expansions.
 * Role: Self-hosted online multiplayer clone of Settlers of Catan
 2025: Ze-Monitor
 * C++ utility leveraging Level Zero API to monitor GPUs
 * https://github.com/jketreno/ze-monitor
--- a/src/ketr-chat/package-lock.json
+++ b/src/ketr-chat/package-lock.json
--- a/src/ketr-chat/package.json
+++ b/src/ketr-chat/package.json
@ -19,6 +19,7 @@
    "react": "^19.0.0",
    "react-dom": "^19.0.0",
    "react-markdown": "^10.1.0",
    "react-plotly.js": "^2.6.0",
    "react-scripts": "5.0.1",
    "react-spinners": "^0.15.0",
    "rehype-katex": "^7.0.1",
--- a/src/ketr-chat/src/App.tsx
+++ b/src/ketr-chat/src/App.tsx
@ -73,7 +73,6 @@ interface ControlsParams {
  systemInfo: SystemInfo,
  toggleTool: (tool: Tool) => void,
  toggleRag: (tool: Tool) => void,
  setRags: (rags: Tool[]) => void,
  setSystemPrompt: (prompt: string) => void,
  reset: (types: ("rags" | "tools" | "history" | "system-prompt")[], message: string) => Promise<void>
 };
@ -427,8 +426,26 @@ const App = () => {
  }, [sessionId, rags, setRags, setSnack, loc]);
  const toggleRag = async (tool: Tool) => {
-    setSnack("RAG is not yet implemented", "warning");
+    tool.enabled = !tool.enabled
-  }
+    try {
      const response = await fetch(getConnectionBase(loc) + `/api/rags/${sessionId}`, {
        method: 'PUT',
        headers: {
          'Content-Type': 'application/json',
          'Accept': 'application/json',
        },
        body: JSON.stringify({ "tool": tool?.name, "enabled": tool.enabled }),
      });
      const rags = await response.json();
      setRags([...rags])
      setSnack(`${tool?.name} ${tool.enabled ? "enabled" : "disabled"}`);
    } catch (error) {
      console.error('Fetch error:', error);
      setSnack(`${tool?.name} ${tool.enabled ? "enabling" : "disabling"} failed.`, "error");
      tool.enabled = !tool.enabled
    }
  };
  const toggleTool = async (tool: Tool) => {
    tool.enabled = !tool.enabled
@ -544,7 +561,7 @@ const App = () => {
  const drawer = (
    <>
-      {sessionId !== undefined && systemInfo !== undefined && <Controls {...{ tools, rags, reset, systemPrompt, toggleTool, toggleRag, setRags, setSystemPrompt, systemInfo }} />}
+      {sessionId !== undefined && systemInfo !== undefined && <Controls {...{ tools, rags, reset, systemPrompt, toggleTool, toggleRag, setSystemPrompt, systemInfo }} />}
    </>
  );
@ -787,17 +804,8 @@ const App = () => {
            {drawer}
          </Drawer>
        </Box>
-        <Box
+        <Box component="main" sx={{ flexGrow: 1, overflow: 'auto' }} className="ChatBox">
-          component="main"
+          <Box className="Conversation" sx={{ flexGrow: 2, p: 1 }} ref={conversationRef}>
          sx={{
            flexGrow: 1,
            overflow: 'auto'
          }}
          className="ChatBox">
          <Box className="Conversation"
            sx={{ flexGrow: 2, p: 1 }}
            ref={conversationRef}>
            {conversation.map((message, index) => {
              const formattedContent = message.content.trim();
@ -844,26 +852,26 @@ const App = () => {
              />
            </div>
          </Box>
        </Box>
-        <Box className="Query" sx={{ display: "flex", flexDirection: "row", p: 1 }}>
+          <Box className="Query" sx={{ display: "flex", flexDirection: "row", p: 1 }}>
-          <TextField
+            <TextField
-            variant="outlined"
+              variant="outlined"
-            disabled={processing}
+              disabled={processing}
-            autoFocus
+              autoFocus
-            fullWidth
+              fullWidth
-            type="text"
+              type="text"
-            value={query}
+              value={query}
-            onChange={(e) => setQuery(e.target.value)}
+              onChange={(e) => setQuery(e.target.value)}
-            onKeyDown={handleKeyPress}
+              onKeyDown={handleKeyPress}
-            placeholder="Enter your question..."
+              placeholder="Enter your question..."
-            id="QueryInput"
+              id="QueryInput"
-          />
+            />
-          <AccordionActions>
+            <AccordionActions>
-            <Tooltip title="Send">
+              <Tooltip title="Send">
-              <Button sx={{ m: 0 }} variant="contained" onClick={sendQuery}><SendIcon /></Button>
+                <Button sx={{ m: 0 }} variant="contained" onClick={sendQuery}><SendIcon /></Button>
-            </Tooltip>
+              </Tooltip>
-          </AccordionActions>
+            </AccordionActions>
          </Box>
        </Box>
      </Box>
--- a/src/server.py
+++ b/src/server.py
@ -51,9 +51,11 @@ from bs4 import BeautifulSoup
 from fastapi import FastAPI, HTTPException, BackgroundTasks, Request
 from fastapi.responses import JSONResponse, StreamingResponse, FileResponse, RedirectResponse
 from fastapi.middleware.cors import CORSMiddleware
-from utils import rag
+
 from utils import rag as Rag
 from tools import (
    get_tool_alias,
    get_weather_by_location,
    get_current_datetime,
    get_ticker_price,
@ -61,11 +63,10 @@ from tools import (
 )
 rags = [
-    { "name": "JPK", "enabled": False, "description": "Expert data about James Ketrenos, including work history, personal hobbies, and projects." },
+    { "name": "JPK", "enabled": True, "description": "Expert data about James Ketrenos, including work history, personal hobbies, and projects." },
-    { "name": "LKML", "enabled": False, "description": "Full associative data for entire LKML mailing list archive." },
+    # { "name": "LKML", "enabled": False, "description": "Full associative data for entire LKML mailing list archive." },
 ]
 def get_installed_ram():
    try:
        with open('/proc/meminfo', 'r') as f:
@ -138,7 +139,7 @@ OLLAMA_API_URL = "http://ollama:11434"  # Default Ollama local endpoint
 #MODEL_NAME = "deepseek-r1:7b"
 #MODEL_NAME = "llama3.2"
 MODEL_NAME = "qwen2.5:7b"
-LOG_LEVEL="debug"
+LOG_LEVEL="info"
 USE_TLS=False
 WEB_HOST="0.0.0.0"
 WEB_PORT=5000
@ -295,21 +296,21 @@ async def handle_tool_calls(message):
                    ret = None
                else:
                    ret = get_ticker_price(ticker)
-                tools_used.append(f"{tool}({ticker})")
+                tools_used.append(f"{get_tool_alias(tool)}({ticker})")
            case 'summarize_site':
                url = arguments.get('url');
                question = arguments.get('question', 'what is the summary of this content?')
                ret = await summarize_site(url, question)
-                tools_used.append(f"{tool}('{url}', '{question}')")
+                tools_used.append(f"{get_tool_alias(tool)}('{url}', '{question}')")
            case 'get_current_datetime':
                tz = arguments.get('timezone')
                ret = get_current_datetime(tz)
-                tools_used.append(f"{tool}('{tz}')")
+                tools_used.append(f"{get_tool_alias(tool)}('{tz}')")
            case 'get_weather_by_location':
                city = arguments.get('city')
                state = arguments.get('state')
                ret = get_weather_by_location(city, state)
-                tools_used.append(f"{tool}('{city}', '{state}')")
+                tools_used.append(f"{get_tool_alias(tool)}('{city}', '{state}')")
            case _:
                ret = None
        response.append({
@ -411,13 +412,14 @@ def llm_tools(tools):
 # %%
 class WebServer:
-    def __init__(self, logging, client, model=MODEL_NAME):
+    def __init__(self, logging, client, collection, model=MODEL_NAME):
        self.logging = logging
        self.app = FastAPI()
        self.contexts = {}
        self.client = client
        self.model = model
        self.processing = False
        self.collection = collection
        self.app.add_middleware(
            CORSMiddleware,
@ -451,9 +453,9 @@ class WebServer:
                        case "system-prompt":
                            context["system"] = [{"role": "system", "content": system_message}]
                            response["system-prompt"] = { "system-prompt": system_message }
-                        case "rag":
+                        case "rags":
-                            context["rag"] = rags.copy()
+                            context["rags"] = rags.copy()
-                            response["rags"] = context["rag"]
+                            response["rags"] = context["rags"]
                        case "tools":
                            context["tools"] = default_tools(tools)
                            response["tools"] = context["tools"]
@ -461,14 +463,13 @@ class WebServer:
                            context["history"] = []
                            response["history"] = context["history"]
                if not response:
-                    return JSONResponse({ "error": "Usage: { reset: rag|tools|history|system-prompt}"})
+                    return JSONResponse({ "error": "Usage: { reset: rags|tools|history|system-prompt}"})
                else:
                    self.save_context(context_id)
                    return JSONResponse(response)
            except:
-                return JSONResponse({ "error": "Usage: { reset: rag|tools|history|system-prompt}"})
+                return JSONResponse({ "error": "Usage: { reset: rags|tools|history|system-prompt}"})
        @self.app.put('/api/system-prompt/{context_id}')
        async def put_system_prompt(context_id: str, request: Request):
@ -529,7 +530,7 @@ class WebServer:
        @self.app.get('/api/history/{context_id}')
        async def get_history(context_id: str):
            context = self.upsert_context(context_id)
-            return JSONResponse(context["history"])
+            return JSONResponse(context["ragless_history"])
        @self.app.get('/api/tools/{context_id}')
        async def get_tools(context_id: str):
@ -560,6 +561,26 @@ class WebServer:
            context = self.upsert_context(context_id)
            return JSONResponse(context["rags"])
        @self.app.put('/api/rags/{context_id}')
        async def put_rags(context_id: str, request: Request):
            if not is_valid_uuid(context_id):
                logging.warning(f"Invalid context_id: {context_id}")
                return JSONResponse({"error": "Invalid context_id"}, status_code=400)
            context = self.upsert_context(context_id)
            try:
                data = await request.json()
                modify = data["tool"]
                enabled = data["enabled"]
                for tool in context["rags"]:
                    if modify == tool["name"]:
                        tool["enabled"] = enabled
                        self.save_context(context_id)
                        return JSONResponse(context["rags"])
                return JSONResponse({ "status": f"{modify} not found in tools." }), 404
            except:
                return JSONResponse({ "status": "error" }), 405
        @self.app.get('/api/health')
        async def health_check():
            return JSONResponse({"status": "healthy"})
@ -621,7 +642,6 @@ class WebServer:
        return self.contexts[session_id]
    def create_context(self, context_id = None):
        if not context_id:
            context_id = str(uuid.uuid4())
@ -629,6 +649,7 @@ class WebServer:
            "id": context_id,
            "system": [{"role": "system", "content": system_message}],
            "history": [],
            "ragless_history": [],
            "tools": default_tools(tools),
            "rags": rags.copy()
        }
@ -658,20 +679,40 @@ class WebServer:
        self.processing = True
        history = context["history"]
        ragless_history = context["ragless_history"]
        rag_used = []
        rag_docs = []
        for rag in context["rags"]:
            if rag["enabled"] and rag["name"] == "JPK": # Only support JPK rag right now...
                yield {"status": "processing", "message": f"Checking RAG context {rag['name']}..."}
                matches = Rag.find_similar(llm=self.client, collection=self.collection, query=content, top_k=10)
                if len(matches):
                    rag_used.append(rag['name'])
                    rag_docs.extend(matches)
        preamble = ""
        if len(rag_docs):
            preamble = "Context:\n"
            for doc in rag_docs:
                preamble += doc
            preamble += "\nHuman: "
        # Figure 
        history.append({"role": "user", "content": preamble + content})
        ragless_history.append({"role": "user", "content": content})
        messages = context["system"] + history[-1:]
        try:
            history = context["history"]
            history.append({"role": "user", "content": content})
            messages = context["system"] + history[-1:]
            #logging.info(messages)
            yield {"status": "processing", "message": "Processing request..."}
            # Use the async generator in an async for loop
            response = self.client.chat(model=self.model, messages=messages, tools=llm_tools(context["tools"]))
            tools_used = []
-            yield {"status": "processing", "message": "Initial response received"}
+            yield {"status": "processing", "message": "Initial response received..."}
            if 'tool_calls' in response.get('message', {}):
                yield {"status": "processing", "message": "Processing tool calls..."}
@ -704,7 +745,15 @@ class WebServer:
                final_message = {"role": "assistant", "content": reply, 'metadata': {"title": f"🛠️ Tool(s) used: {','.join(tools_used)}"}}
            else:
                final_message = {"role": "assistant", "content": reply}
            if len(rag_used):
                if "metadata" in final_message:
                    final_message["metadata"]["title"] += f"🔍 RAG(s) used: {','.join(rag_used)}"
                else:
                    final_message["metadata"] = { "title": f"🔍 RAG(s) used: {','.join(rag_used)}" }
            history.append(final_message)
            ragless_history.append(final_message)
            yield {"status": "done", "message": final_message}
        except Exception as e:
@ -732,7 +781,16 @@ def main():
    client = ollama.Client(host=args.ollama_server)
    model = args.ollama_model
-    web_server = WebServer(logging, client, model)
+    documents = Rag.load_text_files("doc")
    print(f"Documents loaded {len(documents)}")
    collection = Rag.get_vector_collection()
    chunks = Rag.create_chunks_from_documents(documents)
    Rag.add_embeddings_to_collection(client, collection, chunks)
    doc_types = set(chunk.metadata['doc_type'] for chunk in chunks)
    print(f"Document types: {doc_types}")
    print(f"Vectorstore created with {collection.count()} documents")
    web_server = WebServer(logging, client, collection, model)
    logging.info(f"Starting web server at http://{args.web_host}:{args.web_port}")
    web_server.run(host=args.web_host, port=args.web_port, use_reloader=False)
--- a/src/utils/rag.py
+++ b/src/utils/rag.py
@ -1 +1,107 @@
-rag = "exists"
+__all__ = [
    'load_text_files',
    'create_chunks_from_documents',
    'get_vector_collection',
    'add_embeddings_to_collection',
    'find_similar'
 ]
 import os
 import glob
 import chromadb
 import ollama
 from langchain.text_splitter import CharacterTextSplitter
 from langchain.schema import Document  # Import the Document class
 if __name__ == "__main__":
  # When running directly, use absolute imports
  import defines
 else:
  # When imported as a module, use relative imports
  from . import defines
 def load_text_files(directory, encoding="utf-8"):
    file_paths = glob.glob(os.path.join(directory, "**/*"), recursive=True)
    documents = []
    for file_path in file_paths:
        if os.path.isfile(file_path):  # Ensure it's a file, not a directory
            try:
                with open(file_path, "r", encoding=encoding) as f:
                    content = f.read()
                    # Extract top-level directory
                    rel_path = os.path.relpath(file_path, directory)  # Relative to base directory
                    top_level_dir = rel_path.split(os.sep)[0]  # Get the first directory in the path
                    documents.append(Document(
                        page_content=content,  # Required format for LangChain
                        metadata={"doc_type": top_level_dir, "path": file_path}
                    ))
            except Exception as e:
                print(f"Failed to load {file_path}: {e}")
    return documents
 def get_vector_collection(path=defines.persist_directory, name="documents"):
  # Initialize ChromaDB client
  chroma_client = chromadb.PersistentClient(path=path, settings=chromadb.Settings(anonymized_telemetry=False))
  # Check if the collection exists and delete it
  if os.path.exists(path):
      try:
          chroma_client.delete_collection(name=name)
      except Exception as e:
          print(f"Failed to delete existing collection: {e}")
  return chroma_client.get_or_create_collection(name=name)
 # Function to generate embeddings using Ollama
 def get_embedding(llm, text):
    response = llm.embeddings(model=defines.model, prompt=text)
    return response["embedding"]
 def add_embeddings_to_collection(llm, collection, chunks):
  # Store documents in ChromaDB
  for i, text_or_doc in enumerate(chunks):
      # If input is a Document, extract the text content
      if isinstance(text_or_doc, Document):
          text = text_or_doc.page_content
      else:
          text = text_or_doc  # Assume it's already a string
      embedding = get_embedding(llm, text)
      collection.add(
          ids=[str(i)], 
          documents=[text], 
          embeddings=[embedding]
      )
 def find_similar(llm, collection, query, top_k=3):
    query_embedding = get_embedding(llm, query)
    results = collection.query(
        query_embeddings=[query_embedding], 
        n_results=top_k
    )
    return results["documents"][0]  # List of top_k matching documents
 def create_chunks_from_documents(docs):
  text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
  return text_splitter.split_documents(docs)
 if __name__ == "__main__":
  # When running directly, use absolute imports
  import defines
  llm = ollama.Client(host=defines.ollama_api_url)
  documents = load_text_files("doc")
  print(f"Documents loaded {len(documents)}")
  collection = get_vector_collection()
  chunks = create_chunks_from_documents(documents)
  add_embeddings_to_collection(llm, collection, chunks)
  doc_types = set(chunk.metadata['doc_type'] for chunk in chunks)
  print(f"Document types: {doc_types}")
  print(f"Vectorstore created with {collection.count()} documents")
  query = "Can you describe James Ketrenos' work history?"
  top_docs = find_similar(llm, query, top_k=3)
  print(top_docs)