Ollama and UVA HPC

Ollama is the easiest way to get up and running with large language models such
as gpt-oss, Gemma 3, Qwen3 and more.

Software Category: data

For detailed information, visit the Ollama website.

Available Versions

To find the available versions and learn how to load them, run:

module spider ollama

The output of the command shows the available Ollama module versions.

For detailed information about a particular Ollama module, including how to load the module, run the module spider command with the module’s full version label. For example:

module spider ollama/0.13.1

Module	Version	Module Load Command
ollama	0.13.1	module load apptainer/1.3.4 ollama/0.13.1

Ollama Open OnDemand Interactive App

Request a session

To get to the interactive app:

Open a web browser and go to: https://ood.hpc.virginia.edu.
Log in with your Netbadge credentials.
Click on “Interactive Apps” on the top bar.
In the drop-down menu, click “Ollama.”

To fill out the form:

Choose a model directory. Select “Predownloaded” if you wish to use the listed models. Otherwise, select “Home” to use your own models.
You can only select partitions that contain GPUs. The session will run on one GPU device.
Under “Optional GPU Type,” choose a GPU type or leave it as “default” (first available).
Click Launch to start the session.

This will start Ollama inside a JupyterLab session. The Ollama server is backed by an Apptainer container instance. The python API is provided by a separate module, ollama-python.

Download model

If you selected “Home” for the model directory and wish to download a new LLM, click on File→New→Terminal to open a terminal window. Run:

ollama pull <LLM>

where <LLM> is the name of the large language model that can be found on the Ollama Models page. “Cloud” models require an API key. (Note: For your convenience, we set up an alias ollama for the actual Apptainer command.)

To list all available models, run:

ollama list

To remove a model, run:

ollama rm <LLM>

To remove all models, you may simply wipe the directory:

rm -rf ~/.ollama/models

Sample code

Copy and paste the following to a notebook. You may modify the prompt and the model. The model name must match exactly with those listed in the OOD form.

Ollama API example

from ollama import chat
from IPython.display import display, Markdown, clear_output

prompt = "Why is the sky blue?"
model = 'gemma3:27b'

response_stream = chat(
    model=model,
    messages=[{'role': 'user', 'content': prompt}],
    stream=True
)

streamed_response = ""

for token in response_stream:
    streamed_response += token['message']['content']
    clear_output(wait=True)
    display(Markdown(f"**LLM Response (Streaming):**\n\n{streamed_response}"))

OpenAI API example

import os
from openai import OpenAI
client = OpenAI(base_url=f'http://{os.environ['OLLAMA_HOST']}/v1', api_key='ollama')

response = client.chat.completions.create(
    model = 'gemma3:27b',
    messages = [
        {"role": "system", "content": "You are a friendly dog."},
        {"role": "user", "content": "Do you want a bone?"}
    ]
)
print(response.choices[0].message.content)

Updated December 8, 2025 | deep-learning, machine-learning, rivanna, software