Content update

This commit is contained in:
James Ketr 2025-04-02 17:44:07 -07:00
parent 5dca368a87
commit 1c6f99a6ae

View File

@ -0,0 +1,133 @@
# Ketr Chat
This LLM agent was built by James Ketrenos in order to provide answers to any questions you may have about his work history.
In addition to being a RAG enabled expert system, the LLM is configured with real-time access to weather, stocks, the current time, and can answer questions about the contents of a website.
## Parts of Ketr Chat
* Backend Server
Provides a custom REST API to support the capabilities exposed from the web UI.
* Pytorch used for LLM communication and inference
* ChromaDB as a vector store for embedding similarities
* FastAPI for the http REST API endpoints
* Serves the static site for production deployment
* Performs all communication with the LLM (currently via ollama.cpp, however I may be switching it back to Hugging Face transformers.)
* Implements the tool subsystem for tool callbacks from the LLM
* Manages a chromadb vector store, including the chunking and embedding of the documents used to provide RAG content related to my career.
* Manages all context sessions
* Currently using qwen2.5:7b, however I frequently switch between different models (llama3.2, deepseek-r1:7b, and mistral:7b.) I've generally had the best results from qwen2.5. DeepSeek-R1 was very cool; the thinking phase was informative for developing system prompts, however the integration with ollama does not support tool calls. That is one reason I'm looking to switch back to Hugging Face transformers.
* Languages: Python, bash
* Web Frontend
Provides a responsive UI for interacting with the system
* Written using React and Mui.
* Exposes enough information to know what the LLM is doing on the backend
* Enables adjusting various parameters, including enabling/disabling tools and the RAG, system prompt, etc.
* Configured to be able to run in development and production. In development mode, the Server does not serve the Web Frontend and only acts as a REST API endpoint.
* Languages: JSX, JavaScript, TypeScript, bash
* Ollama container
If you don't already have ollama installed and running, the container provided in this project is built using the Intel pre-built Ollama package.
* Jupyter notebook
To facilitate rapid development and prototyping, a Jupyter notebook is provided which runs on the same Python package set as the main server container.
# Installation
This project uses docker containers to build. As this was originally written to work on an Intel Arc B580 (Battlemage), it requires a kernel that supports that hardware, such as the one documented at [Intel Graphics Preview](https://github.com/canonical/intel-graphics-preview), which runs in Ubuntu Oracular (24.10)..
NOTE: You need 'docker compose' installed. See [Install Docker Engine on Ubuntu](https://docs.docker.com/engine/install/ubuntu/)
## Want to run under WSL2? No can do...
https://www.intel.com/content/www/us/en/support/articles/000093216/graphics/processor-graphics.html
The A- and B-series discrete GPUs do not support SR-IOV, required for the GPU partitioning that Microsoft Windows uses in order to support GPU acceleration in WSL.
## Building
NOTE: You need 'docker compose' installed. See [Install Docker Engine on Ubuntu](https://docs.docker.com/engine/install/ubuntu/)
```bash
git clone https://github.com/jketreno/ketr.chat
cd ketr.chat
docker compose build
```
## Running
In order to download the models, you need to have a Hugging Face token. See https://huggingface.co/settings/tokens for information on obtaining a token.
Edit .env to add the following:
```.env
HF_ACCESS_TOKEN=<access token from huggingface>
```
NOTE: Models downloaded by most examples will be placed in the ./cache directory, which is bind mounted to the container.
### Ketr Chat
To launch the ketr.chat shell interactively, with the pytorch 2.6 environment loaded, use the default entrypoint to launch a shell:
```bash
docker compose run --rm ketr.chat shell
```
Once in the shell, you can then launch the server.py:
```bash
docker compose run --rm ketr.chat shell
python src/server.py
```
If you launch the server without any parameters, it will run the backend server, which will host the static web frontend built during the `docker compose build`.
That is the behavior if you up the container:
```bash
docker compose up -d
```
### Jupyter
```bash
docker compose up jupyter -d
```
The default port for inbound connections is 8888 (see docker-compose.yml). $(pwd)/jupyter is bind mounted to /opt/juypter in the container, which is where notebooks will be saved by default.
To access the jupyter notebook, go to `https://localhost:8888/jupyter`.
### Monitoring
You can run `ze-monitor` within the launched containers to monitor GPU usage.
```bash
containers=($(docker ps --filter "ancestor=ketr.chat" --format "{{.ID}}"))
if [[ ${#containers[*]} -eq 0 ]]; then
echo "Running ketr.chat container not found."
else
for container in ${containers[@]}; do
echo "Container ${container} devices:"
docker exec -it ${container} ze-monitor
done
fi
```
If an ketr.chat container is running, you should see something like:
```
Container 5317c503e771 devices:
Device 1: 8086:A780 (Intel(R) UHD Graphics 770)
Device 2: 8086:E20B (Intel(R) Graphics [0xe20b])
```
You can then launch ze-monitor in that container specifying the device you wish to monitor:
```
containers=($(docker ps --filter "ancestor=ketr.chat" --format "{{.ID}}"))
docker exec -it ${containers[0]} ze-monitor --device 2
```