Ready to write Python code 4x faster?
If you're ready to move beyond the terminal and start integrating Ollama into your own apps, you'll want to get familiar with the Ollama API endpoint — and how to use it with Python.
In this guide, I’ll show you how to call Ollama programmatically using both curl
and Python. Whether you’re building a local chatbot, scripting bulk generations, or wiring up an AI-powered tool, Ollama’s HTTP interface makes it simple.
The Ollama Endpoint
Once Ollama is installed and running, it exposes a local API on http://localhost:11434
.
To generate a response from a model, make a POST request to the /api/generate
endpoint. Here’s what it looks like with curl
:
curl http://localhost:11434/api/generate -d '{
"model": "phi3",
"prompt": "Why is the sky blue?"
}'
This sends your prompt to the phi3
model and streams the response back.
Let’s break that down:
model
: The name of the model you want to use (must already be installed viaollama run
orollama pull
)prompt
: Your input or question
The response is streamed as newline-delimited JSON chunks, which makes it ideal for real-time interfaces or applications.
🧠 Tip: You can swap "phi3"
with any model you’ve installed locally, like "llama3"
or "mistral"
. Use ollama list
to see what’s available.
Using the Ollama API in Python
Ollama's local API makes it easy to integrate models into your own Python scripts. Let’s start with a simple request-response flow, then move on to streaming.
Basic Response (Non-Streaming)
If you just want the full response back — no fancy streaming — you can send a regular POST request and read the result once it's done:
import requests
url = 'http://localhost:11434/api/generate'
data = {'model': 'phi3', 'prompt': 'Why is the sky blue?'}
with requests.post(url, json=data, stream=False) as response:
for line in response.iter_lines():
if line:
print(line.decode('utf-8'))
This works like a traditional API call: send a prompt, get the full answer back when it’s ready.
🧠 Tip: This method is easier to debug and great for quick scripts, one-off generations, or logging results.
Streaming Response (Live Output)
For a more interactive feel — like seeing the response unfold in real time — you can stream the output instead:
import requests
import json
url = 'http://localhost:11434/api/generate'
data = {'model': 'phi3', 'prompt': 'Why is the sky blue?'}
with requests.post(url, json=data, stream=True) as response:
for line in response.iter_lines():
if line:
json_line = line.decode('utf-8')
response_data = json.loads(json_line)
if response_data['response']:
print(response_data['response'], end='', flush=True)
This streams back each chunk of the model’s response as it's generated — perfect for building chat apps, terminal tools, or anything that benefits from live feedback.
Why Use the Ollama API?
Running models locally with an API gives you a lot of flexibility:
- Build custom UIs or command-line tools
- Create bots or assistants that run entirely offline
- Script workflows using local generation
- Avoid latency and privacy concerns of cloud models
And because it’s just HTTP, you can use any language, not just Python.
Next Steps
- Want more control? Check out the full Ollama API reference.
More Like This
Automating Spreadsheets with Python 101
How to tell the difference between a good and bad Python automation target.
10 Mistakes To Look Out For When Transitioning from Excel To Python
10 Common Mistakes for new programmers transitioning from Excel to Python
Research shows Mito speeds up by 400%
We're always on the hunt for tools that improve our efficiency at work. Tools that let us accomplish more with less time, money, and resources.
3 Rules for Choosing Between SQL and Python
Analysts at the world's top banks are automating their manual Excel work so they can spend less time creating baseline reports, and more time building new analyses that push the company forward.