5.5 KiB
AIRC (pronounced Eric)
AI is Really Cool
This project provides an AI chat client. It runs the neuralchat model, enhanced with a little bit of RAG to fetch news RSS feeds.
Internally, it is built using PyTorch 2.6, Intel IPEX/LLM, and Python 3.11 (several pip packages were not yet available for Python 3.12 shipped with Ubuntu Oracular 24.10, which these containers are based on.)
NOTE: If running on an Intel Arc A series graphics processor, fp64 is not supported and may need to either be emulated or have the model quantized. It has been a while since I've had an A series GPU to test on, so if you run into problems please file an issue--I have some routines I can put in, but don't have a way to test them.
Installation
This project uses docker containers to build. As this was originally written to work on an Intel Arc B580 (Battlemage), it requires a kernel that supports that hardware, such as the one documented at Intel Graphics Preview, which runs in Ubuntu Oracular (24.10)..
NOTE: You need 'docker compose' installed. See Install Docker Engine on Ubuntu
Want to run under WSL2? No can do...
https://www.intel.com/content/www/us/en/support/articles/000093216/graphics/processor-graphics.html
The A- and B-series discrete GPUs do not support SR-IOV, required for the GPU partitioning that Microsoft Windows uses in order to support GPU acceleration in WSL.
Building
NOTE: You need 'docker compose' installed. See Install Docker Engine on Ubuntu
git clone https://github.com/jketreno/airc
cd airc
docker compose build
Containers
This project provides the following containers:
Container | Purpose |
---|---|
airc | Base container with GPU packages installed and configured |
jupyter | airc + Jupyter notebook for running Jupyter sessions |
miniircd | Tiny deployment of an IRC server for testing IRC agents |
ollama | Installation of Intel's pre-built Ollama.cpp |
While developing airc, sometimes Hugging Face is used directly with models loaded via PyTorch. At other times, especially during rapid-development, the ollama deployment is used. This combination allows you to easily access GPUs running either locally (via the local ollama or HF code)
To see which models are easily deployable with Ollama, see the Ollama Model List.
Prior to using a new model, you need to download it:
MODEL=qwen2.5:7b
docker compose exec -it ollama ollama pull ${MODEL}
To download many common models for testing against, you can use the fetch-models.sh
script which will download:
- qwen2.5:7b
- llama3.2
- mxbai-embed-large
- deepseek-r1:7b
- mistral:7b
docker compose exec -it ollama /fetch-models.sh
The persisted volume mount can grow quite large with models, GPU kernel caching, etc. During the development of this project, the ./cache
directory has grown to consume ~250G of disk space.
Running
In order to download Hugging Face models, you need to have a Hugging Face token. See https://huggingface.co/settings/tokens for information on obtaining a token.
Edit .env to add the following:
HF_ACCESS_TOKEN=<access token from huggingface>
HF_HOME=/root/.cache
HF_HOME is set for running in the containers to point to a volume mounted directory which will enable model downloads to be persisted.
NOTE: Models downloaded by most examples will be placed in the ./cache directory, which is bind mounted to the container.
AIRC
To launch the airc shell interactively, with the pytorch 2.6 environment loaded, use the default entrypoint to launch a shell:
docker compose run --rm airc shell
Once in the shell, you can then launch the model-server.py and then the airc.py client:
docker compose run --rm airc shell
src/airc.py --ai-server=http://localhost:5000 &
src/model-server.py
By default, src/airc.py will connect to irc.libera.chat on the airc-test channel. See python src/airc.py --help
for options.
By separating the model-server into its own process, you can develop and tweak the chat backend without losing the IRC connection established by airc.
Jupyter
docker compose up jupyter -d
The default port for inbound connections is 8888 (see docker-compose.yml). $(pwd)/jupyter is bind mounted to /opt/juypter in the container, which is where notebooks will be saved by default.
To access the jupyter notebook, go to https://localhost:8888/jupyter
.
Monitoring
You can run ze-monitor
within the launched containers to monitor GPU usage.
containers=($(docker ps --filter "ancestor=airc" --format "{{.ID}}"))
if [[ ${#containers[*]} -eq 0 ]]; then
echo "Running airc container not found."
else
for container in ${containers[@]}; do
echo "Container ${container} devices:"
docker exec -it ${container} ze-monitor
done
fi
If an airc container is running, you should see something like:
Container 5317c503e771 devices:
Device 1: 8086:A780 (Intel(R) UHD Graphics 770)
Device 2: 8086:E20B (Intel(R) Graphics [0xe20b])
You can then launch ze-monitor in that container specifying the device you wish to monitor:
containers=($(docker ps --filter "ancestor=airc" --format "{{.ID}}"))
docker exec -it ${containers[0]} ze-monitor --device 2