Content update

2025-04-02 17:44:07 -07:00 · 2025-04-02 17:44:07 -07:00 · 1c6f99a6ae
commit 1c6f99a6ae
parent 5dca368a87
1 changed files with 133 additions and 0 deletions
--- a/src/doc/projects/ketr.chat.txt
+++ b/src/doc/projects/ketr.chat.txt
@ -0,0 +1,133 @@
 # Ketr Chat
 This LLM agent was built by James Ketrenos in order to provide answers to any questions you may have about his work history.
 In addition to being a RAG enabled expert system, the LLM is configured with real-time access to weather, stocks, the current time, and can answer questions about the contents of a website.
 ## Parts of Ketr Chat
 * Backend Server
  Provides a custom REST API to support the capabilities exposed from the web UI.
  * Pytorch used for LLM communication and inference
  * ChromaDB as a vector store for embedding similarities
  * FastAPI for the http REST API endpoints
  * Serves the static site for production deployment
  * Performs all communication with the LLM (currently via ollama.cpp, however I may be switching it back to Hugging Face transformers.)
  * Implements the tool subsystem for tool callbacks from the LLM
  * Manages a chromadb vector store, including the chunking and embedding of the documents used to provide RAG content related to my career.
  * Manages all context sessions
  * Currently using qwen2.5:7b, however I frequently switch between different models (llama3.2, deepseek-r1:7b, and mistral:7b.) I've generally had the best results from qwen2.5. DeepSeek-R1 was very cool; the thinking phase was informative for developing system prompts, however the integration with ollama does not support tool calls. That is one reason I'm looking to switch back to Hugging Face transformers.
  * Languages: Python, bash
 * Web Frontend
  Provides a responsive UI for interacting with the system
  * Written using React and Mui.
  * Exposes enough information to know what the LLM is doing on the backend
  * Enables adjusting various parameters, including enabling/disabling tools and the RAG, system prompt, etc.
  * Configured to be able to run in development and production. In development mode, the Server does not serve the Web Frontend and only acts as a REST API endpoint.
  * Languages: JSX, JavaScript, TypeScript, bash
 * Ollama container
  If you don't already have ollama installed and running, the container provided in this project is built using the Intel pre-built Ollama package.
 * Jupyter notebook
  To facilitate rapid development and prototyping, a Jupyter notebook is provided which runs on the same Python package set as the main server container.
 # Installation
 This project uses docker containers to build. As this was originally written to work on an Intel Arc B580 (Battlemage), it requires a kernel that supports that hardware, such as the one documented at [Intel Graphics Preview](https://github.com/canonical/intel-graphics-preview), which runs in Ubuntu Oracular (24.10)..
 NOTE: You need 'docker compose' installed. See [Install Docker Engine on Ubuntu](https://docs.docker.com/engine/install/ubuntu/)
 ## Want to run under WSL2? No can do...
 https://www.intel.com/content/www/us/en/support/articles/000093216/graphics/processor-graphics.html
 The A- and B-series discrete GPUs do not support SR-IOV, required for the GPU partitioning that Microsoft Windows uses in order to support GPU acceleration in WSL.
 ## Building
 NOTE: You need 'docker compose' installed. See [Install Docker Engine on Ubuntu](https://docs.docker.com/engine/install/ubuntu/)
 ```bash
 git clone https://github.com/jketreno/ketr.chat
 cd ketr.chat
 docker compose build
 ```
 ## Running
 In order to download the models, you need to have a Hugging Face token. See https://huggingface.co/settings/tokens for information on obtaining a token.
 Edit .env to add the following:
 ```.env
 HF_ACCESS_TOKEN=<access token from huggingface>
 ```
 NOTE: Models downloaded by most examples will be placed in the ./cache directory, which is bind mounted to the container.
 ### Ketr Chat
 To launch the ketr.chat shell interactively, with the pytorch 2.6 environment loaded, use the default entrypoint to launch a shell:
 ```bash
 docker compose run --rm ketr.chat shell
 ```
 Once in the shell, you can then launch the server.py:
 ```bash
 docker compose run --rm ketr.chat shell
 python src/server.py
 ```
 If you launch the server without any parameters, it will run the backend server, which will host the static web frontend built during the `docker compose build`.
 That is the behavior if you up the container:
 ```bash
 docker compose up -d
 ```
 ### Jupyter
 ```bash
 docker compose up jupyter -d
 ```
 The default port for inbound connections is 8888 (see docker-compose.yml). $(pwd)/jupyter is bind mounted to /opt/juypter in the container, which is where notebooks will be saved by default.
 To access the jupyter notebook, go to `https://localhost:8888/jupyter`.
 ### Monitoring
 You can run `ze-monitor` within the launched containers to monitor GPU usage.
 ```bash
 containers=($(docker ps --filter "ancestor=ketr.chat" --format "{{.ID}}"))
 if [[ ${#containers[*]} -eq 0 ]]; then
  echo "Running ketr.chat container not found."
 else
  for container in ${containers[@]}; do
    echo "Container ${container} devices:"
    docker exec -it ${container} ze-monitor
  done
 fi
 ```
 If an ketr.chat container is running, you should see something like:
 ```
 Container 5317c503e771 devices:
 Device 1: 8086:A780 (Intel(R) UHD Graphics 770)
 Device 2: 8086:E20B (Intel(R) Graphics [0xe20b])
 ```
 You can then launch ze-monitor in that container specifying  the device you wish to monitor:
 ```
 containers=($(docker ps --filter "ancestor=ketr.chat" --format "{{.ID}}"))
 docker exec -it ${containers[0]} ze-monitor --device 2
 ```