145 lines
5.5 KiB
Markdown
145 lines
5.5 KiB
Markdown
# AIRC (pronounced Eric)
|
|
|
|
AI is Really Cool
|
|
|
|
This project provides an AI chat client. It runs the neuralchat model, enhanced with a little bit of RAG to fetch news RSS feeds.
|
|
|
|
Internally, it is built using PyTorch 2.6, Intel IPEX/LLM, and Python 3.11 (several pip packages were not yet available for Python 3.12 shipped with Ubuntu Oracular 24.10, which these containers are based on.)
|
|
|
|
NOTE: If running on an Intel Arc A series graphics processor, fp64 is not supported and may need to either be emulated or have the model quantized. It has been a while since I've had an A series GPU to test on, so if you run into problems please file an [issue](https://github.com/jketreno/airc/issues)--I have some routines I can put in, but don't have a way to test them.
|
|
|
|
# Installation
|
|
|
|
This project uses docker containers to build. As this was originally written to work on an Intel Arc B580 (Battlemage), it requires a kernel that supports that hardware, such as the one documented at [Intel Graphics Preview](https://github.com/canonical/intel-graphics-preview), which runs in Ubuntu Oracular (24.10)..
|
|
|
|
NOTE: You need 'docker compose' installed. See [Install Docker Engine on Ubuntu](https://docs.docker.com/engine/install/ubuntu/)
|
|
|
|
## Want to run under WSL2? No can do...
|
|
|
|
https://www.intel.com/content/www/us/en/support/articles/000093216/graphics/processor-graphics.html
|
|
|
|
The A- and B-series discrete GPUs do not support SR-IOV, required for the GPU partitioning that Microsoft Windows uses in order to support GPU acceleration in WSL.
|
|
|
|
## Building
|
|
|
|
NOTE: You need 'docker compose' installed. See [Install Docker Engine on Ubuntu](https://docs.docker.com/engine/install/ubuntu/)
|
|
|
|
|
|
```bash
|
|
git clone https://github.com/jketreno/airc
|
|
cd airc
|
|
docker compose build
|
|
```
|
|
|
|
## Containers
|
|
|
|
This project provides the following containers:
|
|
|
|
| Container | Purpose |
|
|
|:----------|:---------------------------------------------------------------|
|
|
| airc | Base container with GPU packages installed and configured |
|
|
| jupyter | airc + Jupyter notebook for running Jupyter sessions |
|
|
| miniircd | Tiny deployment of an IRC server for testing IRC agents |
|
|
| ollama | Installation of Intel's pre-built Ollama.cpp |
|
|
|
|
While developing airc, sometimes Hugging Face is used directly with models loaded via PyTorch. At other times, especially during rapid-development, the ollama deployment is used. This combination allows you to easily access GPUs running either locally (via the local ollama or HF code)
|
|
|
|
To see which models are easily deployable with Ollama, see the [Ollama Model List](https://ollama.com/search).
|
|
|
|
Prior to using a new model, you need to download it:
|
|
|
|
```bash
|
|
MODEL=qwen2.5:7b
|
|
docker compose exec -it ollama ollama pull ${MODEL}
|
|
```
|
|
|
|
To download many common models for testing against, you can use the `fetch-models.sh` script which will download:
|
|
|
|
* qwen2.5:7b
|
|
* llama3.2
|
|
* mxbai-embed-large
|
|
* deepseek-r1:7b
|
|
* mistral:7b
|
|
|
|
```bash
|
|
docker compose exec -it ollama /fetch-models.sh
|
|
```
|
|
|
|
The persisted volume mount can grow quite large with models, GPU kernel caching, etc. During the development of this project, the `./cache` directory has grown to consume ~250G of disk space.
|
|
|
|
## Running
|
|
|
|
In order to download Hugging Face models, you need to have a Hugging Face token. See https://huggingface.co/settings/tokens for information on obtaining a token.
|
|
|
|
Edit .env to add the following:
|
|
|
|
```.env
|
|
HF_ACCESS_TOKEN=<access token from huggingface>
|
|
HF_HOME=/root/.cache
|
|
```
|
|
|
|
HF_HOME is set for running in the containers to point to a volume mounted
|
|
directory which will enable model downloads to be persisted.
|
|
|
|
NOTE: Models downloaded by most examples will be placed in the ./cache directory, which is bind mounted to the container.
|
|
|
|
### AIRC
|
|
|
|
To launch the airc shell interactively, with the pytorch 2.6 environment loaded, use the default entrypoint to launch a shell:
|
|
|
|
```bash
|
|
docker compose run --rm airc shell
|
|
```
|
|
|
|
Once in the shell, you can then launch the model-server.py and then the airc.py client:
|
|
|
|
```bash
|
|
docker compose run --rm airc shell
|
|
src/airc.py --ai-server=http://localhost:5000 &
|
|
src/model-server.py
|
|
```
|
|
|
|
By default, src/airc.py will connect to irc.libera.chat on the airc-test channel. See `python src/airc.py --help` for options.
|
|
|
|
By separating the model-server into its own process, you can develop and tweak the chat backend without losing the IRC connection established by airc.
|
|
|
|
### Jupyter
|
|
|
|
```bash
|
|
docker compose up jupyter -d
|
|
```
|
|
|
|
The default port for inbound connections is 8888 (see docker-compose.yml). $(pwd)/jupyter is bind mounted to /opt/juypter in the container, which is where notebooks will be saved by default.
|
|
|
|
To access the jupyter notebook, go to `https://localhost:8888/jupyter`.
|
|
|
|
### Monitoring
|
|
|
|
You can run `ze-monitor` within the launched containers to monitor GPU usage.
|
|
|
|
```bash
|
|
containers=($(docker ps --filter "ancestor=airc" --format "{{.ID}}"))
|
|
if [[ ${#containers[*]} -eq 0 ]]; then
|
|
echo "Running airc container not found."
|
|
else
|
|
for container in ${containers[@]}; do
|
|
echo "Container ${container} devices:"
|
|
docker exec -it ${container} ze-monitor
|
|
done
|
|
fi
|
|
```
|
|
|
|
If an airc container is running, you should see something like:
|
|
|
|
```
|
|
Container 5317c503e771 devices:
|
|
Device 1: 8086:A780 (Intel(R) UHD Graphics 770)
|
|
Device 2: 8086:E20B (Intel(R) Graphics [0xe20b])
|
|
```
|
|
|
|
You can then launch ze-monitor in that container specifying the device you wish to monitor:
|
|
|
|
```
|
|
containers=($(docker ps --filter "ancestor=airc" --format "{{.ID}}"))
|
|
docker exec -it ${containers[0]} ze-monitor --device 2
|
|
``` |