diff --git a/src/doc/projects/ketr.chat.txt b/src/doc/projects/ketr.chat.txt new file mode 100644 index 0000000..c3c98ad --- /dev/null +++ b/src/doc/projects/ketr.chat.txt @@ -0,0 +1,133 @@ +# Ketr Chat + +This LLM agent was built by James Ketrenos in order to provide answers to any questions you may have about his work history. + +In addition to being a RAG enabled expert system, the LLM is configured with real-time access to weather, stocks, the current time, and can answer questions about the contents of a website. + +## Parts of Ketr Chat + +* Backend Server + Provides a custom REST API to support the capabilities exposed from the web UI. + * Pytorch used for LLM communication and inference + * ChromaDB as a vector store for embedding similarities + * FastAPI for the http REST API endpoints + * Serves the static site for production deployment + * Performs all communication with the LLM (currently via ollama.cpp, however I may be switching it back to Hugging Face transformers.) + * Implements the tool subsystem for tool callbacks from the LLM + * Manages a chromadb vector store, including the chunking and embedding of the documents used to provide RAG content related to my career. + * Manages all context sessions + * Currently using qwen2.5:7b, however I frequently switch between different models (llama3.2, deepseek-r1:7b, and mistral:7b.) I've generally had the best results from qwen2.5. DeepSeek-R1 was very cool; the thinking phase was informative for developing system prompts, however the integration with ollama does not support tool calls. That is one reason I'm looking to switch back to Hugging Face transformers. + * Languages: Python, bash + +* Web Frontend + Provides a responsive UI for interacting with the system + * Written using React and Mui. + * Exposes enough information to know what the LLM is doing on the backend + * Enables adjusting various parameters, including enabling/disabling tools and the RAG, system prompt, etc. + * Configured to be able to run in development and production. In development mode, the Server does not serve the Web Frontend and only acts as a REST API endpoint. + * Languages: JSX, JavaScript, TypeScript, bash + +* Ollama container + If you don't already have ollama installed and running, the container provided in this project is built using the Intel pre-built Ollama package. + +* Jupyter notebook + To facilitate rapid development and prototyping, a Jupyter notebook is provided which runs on the same Python package set as the main server container. + +# Installation + +This project uses docker containers to build. As this was originally written to work on an Intel Arc B580 (Battlemage), it requires a kernel that supports that hardware, such as the one documented at [Intel Graphics Preview](https://github.com/canonical/intel-graphics-preview), which runs in Ubuntu Oracular (24.10).. + +NOTE: You need 'docker compose' installed. See [Install Docker Engine on Ubuntu](https://docs.docker.com/engine/install/ubuntu/) + +## Want to run under WSL2? No can do... + +https://www.intel.com/content/www/us/en/support/articles/000093216/graphics/processor-graphics.html + +The A- and B-series discrete GPUs do not support SR-IOV, required for the GPU partitioning that Microsoft Windows uses in order to support GPU acceleration in WSL. + +## Building + +NOTE: You need 'docker compose' installed. See [Install Docker Engine on Ubuntu](https://docs.docker.com/engine/install/ubuntu/) + + +```bash +git clone https://github.com/jketreno/ketr.chat +cd ketr.chat +docker compose build +``` + +## Running + +In order to download the models, you need to have a Hugging Face token. See https://huggingface.co/settings/tokens for information on obtaining a token. + +Edit .env to add the following: + +```.env +HF_ACCESS_TOKEN= +``` + +NOTE: Models downloaded by most examples will be placed in the ./cache directory, which is bind mounted to the container. + +### Ketr Chat + +To launch the ketr.chat shell interactively, with the pytorch 2.6 environment loaded, use the default entrypoint to launch a shell: + +```bash +docker compose run --rm ketr.chat shell +``` + +Once in the shell, you can then launch the server.py: + +```bash +docker compose run --rm ketr.chat shell +python src/server.py +``` + +If you launch the server without any parameters, it will run the backend server, which will host the static web frontend built during the `docker compose build`. + +That is the behavior if you up the container: + +```bash +docker compose up -d +``` + +### Jupyter + +```bash +docker compose up jupyter -d +``` + +The default port for inbound connections is 8888 (see docker-compose.yml). $(pwd)/jupyter is bind mounted to /opt/juypter in the container, which is where notebooks will be saved by default. + +To access the jupyter notebook, go to `https://localhost:8888/jupyter`. + +### Monitoring + +You can run `ze-monitor` within the launched containers to monitor GPU usage. + +```bash +containers=($(docker ps --filter "ancestor=ketr.chat" --format "{{.ID}}")) +if [[ ${#containers[*]} -eq 0 ]]; then + echo "Running ketr.chat container not found." +else + for container in ${containers[@]}; do + echo "Container ${container} devices:" + docker exec -it ${container} ze-monitor + done +fi +``` + +If an ketr.chat container is running, you should see something like: + +``` +Container 5317c503e771 devices: +Device 1: 8086:A780 (Intel(R) UHD Graphics 770) +Device 2: 8086:E20B (Intel(R) Graphics [0xe20b]) +``` + +You can then launch ze-monitor in that container specifying the device you wish to monitor: + +``` +containers=($(docker ps --filter "ancestor=ketr.chat" --format "{{.ID}}")) +docker exec -it ${containers[0]} ze-monitor --device 2 +``` \ No newline at end of file