This is a post about using Ollama for inference of local AI agents using custom python tools.
Tools, are python functions you can give to your llm, which it can then execute to gather additional information or to perform tasks. You give your inference provider your tool and the llm calls the tool in its generation loop, after the tool-call the generation stops, you provide the return values of the tools and the llm resumes generating. In this tutorial we are goimg top use ollama for this because of its ease of use, but it is also possible with llama-cpp Llama-cpp has the significant advantage of being independent of an app such as ollama.
Install/Update the necessary libraries first:
pip install ollama -U
or this requirements.txt
ollama>0.3.0
via this pip command:
pip install -r requirements.txt
Also make sure you have installed the ollama app and ensure its up and running.
To use a llm first pull the model, either from Huggingface's Models list or from ollama itself. If its a model from huggingface follow the instructions under "use this model">"ollama". On ollama run
ollama pull <Model name>
With model name being 'gpt-oss:20b', or whatever model you want to use. After successfully pulling the model, we can finally use it in our python code.
from ollama import chat
response = chat(model='gpt-oss:20b', messages=[
{
'role': 'user',
'content': 'Explain how llms work.',
},
])
print(response['message']['thinking']) # thinking content
print(response['message']['content']) # message content
After importing it, we use the ollama chat API, to ask the model gpt-oss:20b, to explain how llms work. After the synchronous function is executed, we can print the results. The ollama chat API, returns a response object of the following structure:
{
"message":{
"role":"assistant", //or "tool", when specified by the user it can be "user" or "system" for the system message
"content":"...",//response text content, can be None with thinking models
"thinking":"...",//the models internal thoughts only specified with thinking models, can be None
},
"logprobs":[//if specified in the chat parameter with logprobs = True
Logprob(token='.', logprob=-0.005420973990112543, top_logprobs=None)//probability of each token
...
]
}
To create a simple chat to talk back and forth, we need to add a message history in form of a list. Messages follow the same format as the response, with the mentioned exceptions. We can also specify overall rules via the "system" role. The system prompt is normally only used once and set at the beginning of the chat. A python code with history would therefore look as following:
from ollama import chat
history = []
history.append({
'role': 'system',
'content': 'keep your messages short and concise'
})
user_msg = input('Chat:')
while user_msg != '':
history.append({
'role': 'user',
'content': user_msg
})
response = chat(model='gpt-oss:20b', messages=history)
print(response['message']['thinking'])
print('-----------------------------')
print(response['message']['content'])
history.append({
'role':response['message']['role'],
'content':response['message']['content']
})#do not add thinking to history
user_msg = input('Chat:')
We intialize the history, add a system prompt to keep all messages short and concise, then we ask the user for an input and use that input to generate an answer and print it, till the user finally get's bored and exits, by leaving the input empty.
As mentioned the ai, can also utilize tools to enrich the conversation. To give it this ability we first have to define the problem we want to solve and then create as few functions as possible, with the highest possible simplicitly to achieve our goal. Let's say we want our AI to take Notes. In order for the AI to be able to do this we will create a function with a descriptive name such as "take_note". We then think of a short description and add that in docstring, together with specifying the input. Even though python isn't typesafe it's still good to specify the types, to give the llm hints on what types to use.
def take_note(title:str, content:str) -> str:
"""
take notes, to remember importnt things.
title:str
the title of your message
content:str
the content of your message
"""
notes.append(f"# {title}\n{content}")
history.append({
'role': 'system',
'content': 'keep your messages short and concise\n\nYou took the following notes:\n'+'\n- '.join(notes)
})
return "note taken successfully!"
Similarly we create the function "multiply", because llms are inherently bad at calculating.
def multiply(a:float, b:float) -> float:
"""
multiply to floats a and get the result
a:float
the first float
b:float
the second float
"""
return a*b
We then specify the tools we have with the "tools" argument of the chat API.
response = chat(model='gpt-oss:20b', messages=history, tools=tools)
For simplicity and for increasing the ease of making changes later on, we will store all tools in a dictionary called tools:
tools = {
"take_note":take_note,
"multiply":multiply
}
But that's not all. As I said in the beginning, the AI can now specify tools, but Ollama doesn't call them for us. We need to call the function ourselves, and give the result as a message with the role "tool". We then proceed generating till the AI doesn't call any tools anymore.
if response.message.tool_calls:# check if the AI called a tool
for call in response.message.tool_calls:
result = 'Unknown tool'# fallback value if the tool does not exist
for tool in tools.keys():
if call.function.name == tool:
result = tools[tool](**call.function.arguments)#call the tool with the specified arguments
history.append({# add the result to the history
'role': 'tool',
'tool_name': call.function.name,
'content': str(result)
})
We put the above in a loop. The moment, the AI doesn't call any tools, we exit. Therefore, the full function calling code looks like this:
from ollama import chat
from ollama import ChatResponse
history = []
notes = []
def take_note(title:str, content:str) -> str:
"""
take notes, to remember importnt things.
title:str
the title of your message
content:str
the content of your message
"""
notes.append(f"# {title}\n{content}")
history.append({
'role': 'system',
'content': 'keep your messages short and concise\n\nYou took the following notes:\n'+'\n- '.join(notes)
})
return "note taken successfully!"
def multiply(a:float, b:float) -> float:
"""
multiply to floats a and get the result
a:float
the first float
b:float
the second float
"""
return a*b
history.append({
'role': 'system',
'content': 'keep your messages short and concise. Use tools whenever necessary.'
})
user_msg = input('Chat:')
tools = {
"take_note":take_note,
"multiply":multiply
}
while user_msg != '':
history.append({
'role': 'user',
'content': user_msg
})
while True:#while the AI calls tools
response = chat(model='gpt-oss:20b', messages=history, tools=tools.values())
history.append({
'role':response['message']['role'],
'content':response['message']['content']
})#do not add thinking to history
if response.message.tool_calls:# if the AI called a tool
for call in response.message.tool_calls:
result = 'Unknown tool'# fallback value if the tool does not exist
for tool in tools.keys():
if call.function.name == tool:
print("used tool",tool)
result = tools[tool](**call.function.arguments)#call the tool with the specified arguments
history.append({# add the result to the history
'role': 'tool',
'tool_name': call.function.name,
'content': str(result)
})
else:#if the AI finished without a tool call end the generation loop
break
print(response['message']['thinking'])
print('-----------------------------')
print(response['message']['content'])
user_msg = input('Chat:')
We also changed the system prompt to make the AI use the provided tools.
So now, you can use ollama, to create your own local, agentic AI models. If you liked this tutorial, feel free to check by from time to time to check out our other tutorials.