An applied data science research project

Introduction

One of the few useful applications of generative AI is text summarization and subsequent meaning abstraction. Organizations managing topic-specific knowledge can improve the educational experience of their audience by offering a conversational interface (a chatbot) that allows users to reach understanding through natural language conversations.

Companies such as OpenAI, Google and Microsoft sell subscription access to their Large Language Models (LLMs). The service allows users to use the engine of the LLMs on their own knowledge bases. Then, other platforms offer chatbot deployment solutions that will serve the customized LLMs in the form of a chatbot on the website.

IBM.com administering knowledge through a chatbot

These solutions have a recurrent cost, and the educational infrastructure is on a permanent lease and never fully owned.

However, some LLMs are available to be run locally and for free (examples include Llama and Granite)

In this project we want to find out if we can create a chatbot that uses topic-specific knowledge using free, open-source technologies, so that any person or organization custodian to a knowledge base can deploy this solution at a near zero cost.

Knowledge: The Divine Comedy of Dante Alighieri, translated by Charles Eliot Norton

I’ll use this book as the knowledge for several reasons:

  • It is Public Domain, so there is no copyright infringement if we use it to test our prototype.
  • It is already curated as a txt file. While our bot can digest pdfs and even scanned pages, a clean txt file is much lighter to manage.
  • Since commercial LLMs such as ChatGPT have been trained with the book, we will be able to contrast how our local AI leverages direct access to the raw file and determine if its better.
  • Using the text file will allow us to see retrieval augmented generation (RAG) in action, our chatbot should be able to do literal quotes and tell us on which page to find the source.
  • We examine the possible application of the bot as a “book companion” to assist readers in their understanding of a book.


Objective

Implement a near zero cost chatbot that uses technologies that are free to use and hold: Linux Ubuntu, Docker, Ollama, Llama3, OpenWebUI, Llava, docling,

(As a secondary objective, we also explore the possibility of inducing the chatbot to use the Socratic method to guide a productive conversation with the learner.) / out of scope

The depth and breadth of the knowledge will be limited by the capacities of the RAG component. No, main limitation will be the hardware serving the chatbot and the LLM (the hosting VPS)

Knowledge file formats: pdf, docx, mp3, mp4, ppt, .png.

Limitations

if you have lots of traffic and multiple simultaneous users it uses lots of energy (electricity) and hardware resources, so you have to scale considering compute and environmental dimensions

Methodology

Before deploying it to an online server, we shall develop the containerized solution locally.

Table 1. Tech stack

ComponentLicensePurposeNotes
Ubuntu 24.04.3 LTS OSLinux distribution based primarily on free and open-source softwareLinux Ubuntu is the host operating system or the primary environment where all the other components are installedSimple to install and replicate
OllamaMIT License. Almost entirely unrestricted and designed to be used by anyone, for any purposeThe core Ollama software is the framework that allows to download and run modelsIt manages the LLMs.
DockerOpen source, Apache-2.0 LicenseDocker allows to run the OpenWebUI application in a self-contained unit (a container) that is entirely separate from the Ubuntu operating systemIt safely contains the OpenWEbUI component
OpenWebUIYou are free to use, modify, and redistribute the code for personal projects, businesses, or internal use as long as you do not remove or alter the “Open WebUI” branding. LicenseWeb-based chat interface designed to manage and interact with LLMs. It provides a feature-rich, ChatGPT-style experience for self-hosted AI deployments, prioritizing privacy, data sovereignty, and customizabilityGives the chatgpt user experience.
ngrokFree to use, not open sourceNgrok is a secure, public gateway to the internet, so that your local AI can be accessed through any browser or device in the world.Gives your bot a sharable and persistent URL
Llama3 8bMeta Llama Community License (a bespoke, conditional open-source license).
General Use: It is generally permissive for both research and commercial use
Llama 3 is a family of powerful open-source large language models (LLMs) developed by Meta. The most accessible version for running on consumer hardware (laptops, desktops) is Llama 3 8B (8 billion parameters).While llama4 is now available it requires massive resources, llama3 is more appropriate. You can also use deepseek, granite, gemma, etc
LLavaMIT License. Free to use.Large Language and Vision Assistant. is a prominent open-source model that combines a large language model (like Llama) with a vision encoder to handle image and text inputs.we use this to interpret screenshots, diagrams, hand written text, images.
The divine comedyPublic domain, no copyrightIts the Knowledge Base of out chatbotOur subject matter source knowledge
Dante Alighieri: bio-bibliographiePublid domain, no copyrightAnother text file for knowledgeMore topic-specific knowledge to enrich answers. Written in Italian, perfect to test multilingual capabilities.

LLava will be used to generate embeddings from the screenshots

Install Ubuntu, Install Ollama, Install Docker Install OpenWebUI Install Llava Install docling

after installation….

First, launch Ollama on Ubuntu:

sudo systemctl start ollama

Verify Ollama is running:

sudo systemctl status ollama

Launch OpenWebUI:

docker start open-webui

On a browser on the same machine, navigate to http://localhost:8080/

to stop working on it, shut down openwebui and ollama:

docker stop open-webui
sudo systemctl stop ollama

To be able to learn from screenshots, diagrams and image files in general, we install LlaVa

ollama run llava

ngrok http 8080 --domain=nonblamable-nonsingular-rachael.ngrok-free.dev


the image is processed by LLava (Large Language and Vision Assistant)

Hardware Resources

Using Llava+LLM+Ollama is resource intensive, about 14GB Ram+5 GB VRam

Multilingual capabilities

If the selected model was trained on multiple languages, the bot will be multilingual out of the box, just prompt it on the desired language:

Context Supplementation with Web Search

We add a web search capacities to the bot to allow it to increase its context before answering, we choose Tavily (a free web search API)

Results


Test the local AI at https://nonblamable-nonsingular-rachael.ngrok-free.dev/

Conclusion

References

Support

Need help setting up this solution on your site?

email me

Privacy Policy | Affiliate Disclaimer | Terms & Conditions | Opt-Out Preferences