๐ง Offline LLMs in Linux | Ollama on Linux | Easy Setup Guide

Siddhant Bali, an aspiring tech entrepreneur, is an Undergraduate Research Scholar at IIIT Delhi, currently pursuing a B.Tech in Computer Science Engineering with a focus on design (CSD). Excelling in college activities and event management, Siddhant's entrepreneurial spirit propels him into innovative ventures. Connect on LinkedIn or reach out at siddhant22496@iiitd.ac.in for more info.
๐ What is Ollama?
Ollama is a tool to run LLMs (Large Language Models) locally on your computer with just one command. Itโs beginner-friendly, supports offline usage, and works on most modern Linux systems.
โ Features
Supports models like LLaMA 3, Mistral, Phi-3, Code LLaMA, and more
CLI-based: clean and fast
Works on CPU or GPU
Easy install and model usage
Free and open-source
๐ฅ๏ธ System Requirements
OS: Linux (Ubuntu, Fedora, Arch, etc.)
Memory: 8 GB+ RAM recommended
CPU: Any modern x86_64 processor
(Optional) GPU: For faster performance (NVIDIA preferred)
๐ ๏ธ 1. Install Ollama
๐น Run this command:
curl -fsSL https://ollama.com/install.sh | sh
This:
Installs the Ollama CLI
Sets up the system service
Adds the
ollamauser and group
๐ 2. Run a Model
๐น Example (LLaMA 3):
ollama run llama3
It will:
Pull the model automatically (first time only)
Start an interactive chat in your terminal

๐ง 3. Try Other Models
| Model Name | Size (Params) | Type | Strengths | Use Case Examples | Command |
| LLaMA 3 | 8B / 70B | General-purpose | Balanced reasoning, long context | Chatbots, coding, general Q&A | ollama run llama3 |
| Mistral | 7B | General-purpose | Fast, good quality | Lightweight assistant, dev tools | ollama run mistral |
| Phi-3 | 3.8B | Lightweight | Extremely small, fast | Mobile devices, embedded use, casual chat | ollama run phi3 |
| Code LLaMA | 7B / 13B | Code-focused | Best for programming tasks | Code generation, debugging | ollama run codellama |
| LLaMA 2 | 7B / 13B | General-purpose | Earlier version, still powerful | Chat, essays, summarization | ollama run llama2 |
| Gemma | 2B / 7B | Google model | Fast & aligned | Chat, education, summarization | ollama run gemma |
| Neural Chat | 7B | Chat optimized | Tuned for conversational flow | Personal assistant, Q&A | ollama run neural-chat |
ollama run mistral
ollama run phi3
ollama run codellama
ollama run llama2
๐ List All Installed Models:
ollama list
โ Remove a Model:
ollama remove mistral
๐ง 4. Use as a Local API (Optional)
Start the Ollama server:
ollama serve
Use HTTP API to query models:
curl http://localhost:11434/api/generate -d '{
"model": "llama3",
"prompt": "What is the capital of India?"
}'
๐ Model Location
Models are stored at:
~/.ollama/models
๐ Uninstall Ollama
sudo systemctl stop ollama
sudo rm -rf /usr/local/bin/ollama /usr/local/lib/ollama ~/.ollama
sudo userdel ollama
sudo groupdel ollama
๐ก Tips
Use Q4_K_M or Q8_0 models for better performance
You can run Ollama completely offline after the model is downloaded
Combine it with tools like LM Studio, KoboldCpp, or Streamlit apps



