{"id":5132,"date":"2025-01-11T18:51:37","date_gmt":"2025-01-11T18:51:37","guid":{"rendered":"https:\/\/blog.gordonbuchan.com\/blog\/?p=5132"},"modified":"2025-01-29T13:21:26","modified_gmt":"2025-01-29T13:21:26","slug":"using-ollama-to-host-an-llm-on-cpu-only-equipment-to-enable-a-local-chatbot-and-llm-api-server","status":"publish","type":"post","link":"https:\/\/blog.gordonbuchan.com\/blog\/index.php\/2025\/01\/11\/using-ollama-to-host-an-llm-on-cpu-only-equipment-to-enable-a-local-chatbot-and-llm-api-server\/","title":{"rendered":"Using Ollama to host an LLM on CPU-only equipment to enable a local chatbot and LLM API"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><br>In this post, we install the Ollama LLM hosting software, and load a large language model (LLM), a 5GB file produced by a company called Mistral. We then test local inference, interacting with the model at the command line. We send test queries to the application protocol interface (API) server. We install an application called Open-WebUI that enables a web chat interface to the LLM.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Note: this procedure references the mistral model. however, you can specify other models, such as dolphin-mistral. Consult the following page for available models. Try to limit your choices to 7B complexity, unless you have a GPU.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/ollama.com\/library?sort=newest\">https:\/\/ollama.com\/library?sort=newest<\/a><\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Using the CPU servers we have now<\/h1>\n\n\n\n<p class=\"wp-block-paragraph\">Until 2023, graphical processing units (GPUs) were only of interest to video gamers, animators, and mechanical designers. There is now an imperative for GPU resources on most new servers going forward, for local inference and retrieval augmented generation (RAG). However we will need to devise an interim approach to use the CPU-centric servers we have, even for some AI inference tasks, until the capex cycles have refreshed in 3-4 years from now. On a CPU-only system, the system response time for a query can range from 2-5 seconds to 30-40 seconds. This level of performance may be acceptable for some use cases, including scripted tasks for which a 40 second delay is not material. Deploying this solution on a system with even a modest Nvidia GPU will result in dramatic increases in performance.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Why host an LLM locally<\/h1>\n\n\n\n<ul class=\"wp-block-list\">\n<li>To learn how LLMs are built<\/li>\n\n\n\n<li>To achieve data sovereignty by operating on a private system<\/li>\n\n\n\n<li>To save expense by avoiding the need for external LLM vendors<\/li>\n<\/ul>\n\n\n\n<h1 class=\"wp-block-heading\">Preparing a computer for deployment<\/h1>\n\n\n\n<p class=\"wp-block-paragraph\">This procedure was tested on Ubuntu Server 24.04. Baremetal is better than a virtual machine for this use case, allowing the software to access all of the resources of the host system. In terms of resources, you will need a relatively powerful CPU, like an i7, and 16-32GB of RAM.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Note: the version of Python required by Open-WebUI is Python 3.12, which is supported by default in Ubuntu Server 24.04 LTS. If you are on an older version of the operating, you can install a newer version of Python using a <a href=\"https:\/\/askubuntu.com\/questions\/1398568\/installing-python-who-is-deadsnakes-and-why-should-i-trust-them\">PPA<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Do you need a GPU?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">No, but a GPU will make your inference noticeably faster. If you have an Nvidia GPU, ensure that you have Nvidia CUDA drivers enabled. If you have an AMD GPU, ensure that you have AMD ROCM drivers. There is some talk of support for Intel GPUs but none of it is yet practical.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Note: if you have an Nvidia GPU, you may want to consider <a href=\"https:\/\/github.com\/vllm-project\/vllm\">vLLM<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Ollama is able to work on a CPU-only system<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Ollama is able to work on a CPU-only system, and that is what we will implement in this post. Ollama is able to achieve performance that may be acceptable for certain kinds of operations. For example, large batch operations that run overnight, that can accept a 30-60 second delay versus 2-10 seconds for a GPU-driven solution. For some questions, like \u201cwhy is the sky blue?\u2019 an answer will start immediately. For more complex questions, there may be a 5-10 second delay before answering, and the text will arrive slowly enough to remind you of 300 baud modems (for those of you who get that reference). The wonder of a dancing bear is not in how well it dances, but that it dances at all. This level of performance may be acceptable for some use cases, in particular batched operations and programmatic access via a custom function invoking commands sent to the API server.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Escalating to root using sudo<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">From a shell, enter the following command:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nsudo su\n<\/pre><\/div>\n\n\n<p class=\"wp-block-paragraph\">(enter the password when requested)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Opening ports in the UFW firewall<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">You may need to open ports on the UFW firewall to enable the chat client.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Enter the following commands:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nufw allow 11434\/tcp\nufw allow 8080\/tcp\n<\/pre><\/div>\n\n\n<h2 class=\"wp-block-heading\">Ensuring that the system is up to date<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Enter the following commands:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\napt clean\napt update\napt upgrade\napt install curl python3-venv python3-pip ffmpeg\n<\/pre><\/div>\n\n\n<h2 class=\"wp-block-heading\">A note re RHEL and variants like Fedora and AlmaLinux<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Although this procedure has not been tested on RHEL and variants like Fedora and AlmaLinux, I looked at the installation script and those platforms are supported. In theory, you could configure an RHEL-type system by using equivalent firewall-cmd and dnf commands.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Installing Ollama using the installation script<\/h1>\n\n\n\n<p class=\"wp-block-paragraph\">Ollama provides an installation script that automates the installation. From a shell as root, enter the following command:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\ncurl -fsSL https:\/\/ollama.com\/install.sh | sh\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"866\" height=\"629\" src=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-21.png\" alt=\"\" class=\"wp-image-5188\" srcset=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-21.png 866w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-21-300x218.png 300w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-21-768x558.png 768w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Pulling the mistral image<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Enter the following command:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nollama pull mistral\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"866\" height=\"629\" src=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-7.png\" alt=\"\" class=\"wp-image-5141\" srcset=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-7.png 866w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-7-300x218.png 300w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-7-768x558.png 768w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Listing the images available<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Enter the following command:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nollama list\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"866\" height=\"629\" src=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-8.png\" alt=\"\" class=\"wp-image-5142\" srcset=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-8.png 866w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-8-300x218.png 300w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-8-768x558.png 768w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Testing Ollama and the LLM using the command line<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Enter the following command. Test the chat interface on the command line in the shell:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nollama run mistral\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"866\" height=\"629\" src=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-9.png\" alt=\"\" class=\"wp-image-5143\" srcset=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-9.png 866w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-9-300x218.png 300w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-9-768x558.png 768w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Testing the API server using curl<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Enter the following commands:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nsystemctl restart ollama\nsystemctl status ollama\nsystemctl enable ollama\ncurl http:\/\/localhost:11434\/api\/generate -d &#039;{\n&quot;model&quot;: &quot;mistral&quot;,\n&quot;prompt&quot;:&quot;Why is the sky blue?&quot;\n}&#039;\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"866\" height=\"629\" src=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-22.png\" alt=\"\" class=\"wp-image-5189\" srcset=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-22.png 866w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-22-300x218.png 300w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-22-768x558.png 768w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Enter the following command:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\ncurl http:\/\/localhost:11434\/api\/chat -d &#039;{\n&quot;model&quot;: &quot;mistral&quot;,\n&quot;messages&quot;: &#x5B;\n{ &quot;role&quot;: &quot;user&quot;, &quot;content&quot;: &quot;why is the sky blue?&quot; }\n]\n}&#039;\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"866\" height=\"629\" src=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-11.png\" alt=\"\" class=\"wp-image-5146\" srcset=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-11.png 866w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-11-300x218.png 300w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-11-768x558.png 768w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/figure>\n\n\n\n<h1 class=\"wp-block-heading\">Preparing the system for Open-WebUI<\/h1>\n\n\n\n<p class=\"wp-block-paragraph\">To prepare the system for Open-WebUI, we must create a working directory, and create a venv (virtual python environment).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Enter the following commands:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\ncd ~\npwd\nmkdir ollamatmp\ncd ollamatmp\npython3 -m venv ollama_env\nsource ollama_env\/bin\/activate\npip install open-webui\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"866\" height=\"629\" src=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-24.png\" alt=\"\" class=\"wp-image-5191\" srcset=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-24.png 866w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-24-300x218.png 300w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-24-768x558.png 768w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"866\" height=\"629\" src=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-13.png\" alt=\"\" class=\"wp-image-5148\" srcset=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-13.png 866w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-13-300x218.png 300w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-13-768x558.png 768w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Starting the open-webui serve process<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Enter the following command:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nopen-webui serve\n<\/pre><\/div>\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"866\" height=\"629\" src=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-25.png\" alt=\"\" class=\"wp-image-5192\" srcset=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-25.png 866w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-25-300x218.png 300w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-25-768x558.png 768w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"866\" height=\"629\" src=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-16.png\" alt=\"\" class=\"wp-image-5151\" srcset=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-16.png 866w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-16-300x218.png 300w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-16-768x558.png 768w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Visiting the Open-WebUI web page interface<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Using a web browser, visit this address:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"http:\/\/127.0.0.1:8080\/\">http:\/\/127.0.0.1:8080<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">You will be prompted to create an admin account:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"575\" height=\"522\" src=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-17.png\" alt=\"\" class=\"wp-image-5152\" srcset=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-17.png 575w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-17-300x272.png 300w\" sizes=\"auto, (max-width: 575px) 100vw, 575px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"606\" src=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-18-1024x606.png\" alt=\"\" class=\"wp-image-5153\" srcset=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-18-1024x606.png 1024w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-18-300x178.png 300w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-18-768x455.png 768w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-18.png 1432w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Using the Open-WebUI chat interface<\/h2>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"623\" src=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-19-1024x623.png\" alt=\"\" class=\"wp-image-5154\" srcset=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-19-1024x623.png 1024w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-19-300x182.png 300w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-19-768x467.png 768w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-19.png 1462w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><br>This window took 30 seconds to begin showing its answer, then another 20 seconds to complete generating the answer:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"606\" src=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-20-1024x606.png\" alt=\"\" class=\"wp-image-5155\" srcset=\"https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-20-1024x606.png 1024w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-20-300x178.png 300w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-20-768x455.png 768w, https:\/\/blog.gordonbuchan.com\/blog\/wp-content\/uploads\/2025\/01\/image-20.png 1432w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Using nginx as a proxy to expose the API port to the local network<\/h1>\n\n\n\n<p class=\"wp-block-paragraph\">By default, the Ollama API server answers on port 11434 but only on the local address 127.0.0.1. You can use nginx as a proxy to expose the API to the local network. Enter the following commands:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nufw allow 8085\/tcp\napt install nginx\ncd \/etc\/nginx\/sites-enabled\nnano default\n<\/pre><\/div>\n\n\n<p class=\"wp-block-paragraph\">Use the nano editor to add the following text:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nserver {\n    listen 8085;\n\n    location \/ {\n        proxy_pass http:\/\/127.0.0.1:11434;  # Replace with your Ollama API port\n        proxy_http_version 1.1;\n        proxy_set_header Upgrade $http_upgrade;\n        proxy_set_header Connection &#039;upgrade&#039;;\n        proxy_set_header Host $host;\n        proxy_cache_bypass $http_upgrade;\n\n        # Optional: Add timeout settings for long-running API calls\n        proxy_connect_timeout 60s;\n        proxy_read_timeout 60s;\n        proxy_send_timeout 60s;\n    }\n}\n<\/pre><\/div>\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Save and exit the file.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Enter this command:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nsystemctl restart nginx\n<\/pre><\/div>\n\n\n<h2 class=\"wp-block-heading\">Testing the exposed API port from another computer<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">From another computer, enter the command (where xxx.xxx.xxx.xxx is the IP address of the computer hosting the Ollama API server):<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\ncurl http:\/\/xxx.xxx.xxx.xxx:8085\/api\/chat -d &#039;{\n&quot;model&quot;: &quot;mistral&quot;,\n&quot;messages&quot;: &#x5B;\n{ &quot;role&quot;: &quot;user&quot;, &quot;content&quot;: &quot;why is the sky blue?&quot; }\n]\n}&#039;\n<\/pre><\/div>\n\n\n<h1 class=\"wp-block-heading\">Creating a systemd service to start the Open-WebUI chat interface automatically<\/h1>\n\n\n\n<p class=\"wp-block-paragraph\">Enter the following command:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nnano \/etc\/systemd\/system\/open-webui.service\n<\/pre><\/div>\n\n\n<p class=\"wp-block-paragraph\">Use the nano editor to add the following text:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n&#x5B;Unit]\nDescription=Open-WebUI Service\nAfter=network.target\n\n&#x5B;Service]\nUser=root\nWorkingDirectory=\/root\/ollamatmp\/ollama_env\nExecStart=\/usr\/bin\/bash -c &quot;source \/root\/ollamatmp\/ollama_env\/bin\/activate &amp;amp;&amp;amp; open-webui serve&quot;\nRestart=always\nEnvironment=PYTHONUNBUFFERED=1\nStandardOutput=journal\nStandardError=journal\n\n&#x5B;Install]\nWantedBy=multi-user.target\n\n<\/pre><\/div>\n\n\n<p class=\"wp-block-paragraph\">Save and exit the file.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Enter the following commands:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nsystemctl daemon-reload\nsystemctl start open-webui\nsystemctl enable open-webui\nsystemctl status open-webui\n<\/pre><\/div>\n\n\n<h1 class=\"wp-block-heading\">Conclusion<\/h1>\n\n\n\n<p class=\"wp-block-paragraph\">You now have an LLM API server, and a web chat for interactive access.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">References<\/h1>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/ollama.com\">https:\/\/ollama.com<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/github.com\/ollama\/ollama\">https:\/\/github.com\/ollama\/ollama<\/a><\/p>\n\n\n\n<h1 class=\"wp-block-heading\">A related post<\/h1>\n\n\n\n<p class=\"wp-block-paragraph\">You may find the following post to be of interest: <a href=\"https:\/\/blog.gordonbuchan.com\/blog\/index.php\/2025\/01\/18\/creating-a-script-that-analyzes-email-messages-messages-using-a-large-language-model-llm-and-where-appropriate-escalates-messages-to-the-attention-of-an-operator\/\">Creating a script that analyzes email messages messages using a large language model (LLM), and where appropriate escalates messages to the attention of an operator<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this post, we install the Ollama LLM hosting software, and load a large language model (LLM), a 5GB file produced by a company called Mistral. We then test local inference, interacting with the model at the command line. We send test queries to the application protocol interface (API) server. We install an application called &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/blog.gordonbuchan.com\/blog\/index.php\/2025\/01\/11\/using-ollama-to-host-an-llm-on-cpu-only-equipment-to-enable-a-local-chatbot-and-llm-api-server\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Using Ollama to host an LLM on CPU-only equipment to enable a local chatbot and LLM API&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-5132","post","type-post","status-publish","format-standard","hentry","category-linux"],"_links":{"self":[{"href":"https:\/\/blog.gordonbuchan.com\/blog\/index.php\/wp-json\/wp\/v2\/posts\/5132","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.gordonbuchan.com\/blog\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.gordonbuchan.com\/blog\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.gordonbuchan.com\/blog\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.gordonbuchan.com\/blog\/index.php\/wp-json\/wp\/v2\/comments?post=5132"}],"version-history":[{"count":53,"href":"https:\/\/blog.gordonbuchan.com\/blog\/index.php\/wp-json\/wp\/v2\/posts\/5132\/revisions"}],"predecessor-version":[{"id":5326,"href":"https:\/\/blog.gordonbuchan.com\/blog\/index.php\/wp-json\/wp\/v2\/posts\/5132\/revisions\/5326"}],"wp:attachment":[{"href":"https:\/\/blog.gordonbuchan.com\/blog\/index.php\/wp-json\/wp\/v2\/media?parent=5132"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.gordonbuchan.com\/blog\/index.php\/wp-json\/wp\/v2\/categories?post=5132"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.gordonbuchan.com\/blog\/index.php\/wp-json\/wp\/v2\/tags?post=5132"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}