backstory/README.md

5.5 KiB

AIRC (pronounced Eric)

AI is Really Cool

This project provides an AI chat client. It runs the neuralchat model, enhanced with a little bit of RAG to fetch news RSS feeds.

Internally, it is built using PyTorch 2.6, Intel IPEX/LLM, and Python 3.11 (several pip packages were not yet available for Python 3.12 shipped with Ubuntu Oracular 24.10, which these containers are based on.)

NOTE: If running on an Intel Arc A series graphics processor, fp64 is not supported and may need to either be emulated or have the model quantized. It has been a while since I've had an A series GPU to test on, so if you run into problems please file an issue--I have some routines I can put in, but don't have a way to test them.

Installation

This project uses docker containers to build. As this was originally written to work on an Intel Arc B580 (Battlemage), it requires a kernel that supports that hardware, such as the one documented at Intel Graphics Preview, which runs in Ubuntu Oracular (24.10)..

NOTE: You need 'docker compose' installed. See Install Docker Engine on Ubuntu

Want to run under WSL2? No can do...

https://www.intel.com/content/www/us/en/support/articles/000093216/graphics/processor-graphics.html

The A- and B-series discrete GPUs do not support SR-IOV, required for the GPU partitioning that Microsoft Windows uses in order to support GPU acceleration in WSL.

Building

NOTE: You need 'docker compose' installed. See Install Docker Engine on Ubuntu

git clone https://github.com/jketreno/airc
cd airc
docker compose build

Containers

This project provides the following containers:

Container Purpose
airc Base container with GPU packages installed and configured
jupyter airc + Jupyter notebook for running Jupyter sessions
miniircd Tiny deployment of an IRC server for testing IRC agents
ollama Installation of Intel's pre-built Ollama.cpp

While developing airc, sometimes Hugging Face is used directly with models loaded via PyTorch. At other times, especially during rapid-development, the ollama deployment is used. This combination allows you to easily access GPUs running either locally (via the local ollama or HF code)

To see which models are easily deployable with Ollama, see the Ollama Model List.

Prior to using a new model, you need to download it:

MODEL=qwen2.5:7b
docker compose exec -it ollama ollama pull ${MODEL}

To download many common models for testing against, you can use the fetch-models.sh script which will download:

  • qwen2.5:7b
  • llama3.2
  • mxbai-embed-large
  • deepseek-r1:7b
  • mistral:7b
docker compose exec -it ollama /fetch-models.sh

The persisted volume mount can grow quite large with models, GPU kernel caching, etc. During the development of this project, the ./cache directory has grown to consume ~250G of disk space.

Running

In order to download Hugging Face models, you need to have a Hugging Face token. See https://huggingface.co/settings/tokens for information on obtaining a token.

Edit .env to add the following:

HF_ACCESS_TOKEN=<access token from huggingface>
HF_HOME=/root/.cache

HF_HOME is set for running in the containers to point to a volume mounted directory which will enable model downloads to be persisted.

NOTE: Models downloaded by most examples will be placed in the ./cache directory, which is bind mounted to the container.

AIRC

To launch the airc shell interactively, with the pytorch 2.6 environment loaded, use the default entrypoint to launch a shell:

docker compose run --rm airc shell

Once in the shell, you can then launch the model-server.py and then the airc.py client:

docker compose run --rm airc shell
src/airc.py --ai-server=http://localhost:5000 &
src/model-server.py

By default, src/airc.py will connect to irc.libera.chat on the airc-test channel. See python src/airc.py --help for options.

By separating the model-server into its own process, you can develop and tweak the chat backend without losing the IRC connection established by airc.

Jupyter

docker compose up jupyter -d

The default port for inbound connections is 8888 (see docker-compose.yml). $(pwd)/jupyter is bind mounted to /opt/juypter in the container, which is where notebooks will be saved by default.

To access the jupyter notebook, go to https://localhost:8888/jupyter.

Monitoring

You can run ze-monitor within the launched containers to monitor GPU usage.

containers=($(docker ps --filter "ancestor=airc" --format "{{.ID}}"))
if [[ ${#containers[*]} -eq 0 ]]; then
  echo "Running airc container not found."
else
  for container in ${containers[@]}; do
    echo "Container ${container} devices:"
    docker exec -it ${container} ze-monitor
  done
fi

If an airc container is running, you should see something like:

Container 5317c503e771 devices:
Device 1: 8086:A780 (Intel(R) UHD Graphics 770)
Device 2: 8086:E20B (Intel(R) Graphics [0xe20b])

You can then launch ze-monitor in that container specifying the device you wish to monitor:

containers=($(docker ps --filter "ancestor=airc" --format "{{.ID}}"))
docker exec -it ${containers[0]} ze-monitor --device 2