Inference of local LLMs and tool calling with ollama

What is Tool-calling?

Tools, are python functions you can give to your llm, which it can then execute to gather additional information or to perform tasks. You give your inference provider your tool and the llm calls the tool in its generation loop, after the tool-call the generation stops, you provide the return values of the tools and the llm resumes generating. In this tutorial we are goimg top use ollama for this because of its ease of use, but it is also possible with llama-cpp Llama-cpp has the significant advantage of being independent of an app such as ollama.

Requirements for a web agent tool

Install/Update the necessary libraries first:


pip install ollama -U

or this requirements.txt


ollama>0.3.0

via this pip command:


pip install -r requirements.txt

Also make sure you have installed the ollama app and ensure its up and running.

A Simple Chat

To use a llm first pull the model, either from Huggingface's Models list or from ollama itself. If its a model from huggingface follow the instructions under "use this model">"ollama". On ollama run


ollama pull <Model name>

With model name being 'gpt-oss:20b', or whatever model you want to use. After successfully pulling the model, we can finally use it in our python code.


from ollama import chat

response = chat(model='gpt-oss:20b', messages=[
	{
		'role': 'user',
		'content': 'Explain how llms work.',
	},
])

print(response['message']['thinking']) # thinking content
print(response['message']['content']) # message content

After importing it, we use the ollama chat API, to ask the model gpt-oss:20b, to explain how llms work. After the synchronous function is executed, we can print the results. The ollama chat API, returns a response object of the following structure:


{
	"message":{
		"role":"assistant", //or "tool", when specified by the user it can be "user" or "system" for the system message
		"content":"...",//response text content, can be None with thinking models
		"thinking":"...",//the models internal thoughts only specified with thinking models, can be None
	},

	"logprobs":[//if specified in the chat parameter with logprobs = True
		Logprob(token='.', logprob=-0.005420973990112543, top_logprobs=None)//probability of each token
		...
	]
}

To create a simple chat to talk back and forth, we need to add a message history in form of a list. Messages follow the same format as the response, with the mentioned exceptions. We can also specify overall rules via the "system" role. The system prompt is normally only used once and set at the beginning of the chat. A python code with history would therefore look as following:


from ollama import chat

history = []

history.append({
			'role': 'system',
			'content': 'keep your messages short and concise'
		})


user_msg = input('Chat:')

while user_msg != '':
	history.append({
			'role': 'user',
			'content': user_msg
		})
	response = chat(model='gpt-oss:20b', messages=history)
	
	print(response['message']['thinking'])
	print('-----------------------------')
	print(response['message']['content'])
	
	history.append({
			'role':response['message']['role'],
			'content':response['message']['content']
		})#do not add thinking to history
	
	user_msg = input('Chat:')

We intialize the history, add a system prompt to keep all messages short and concise, then we ask the user for an input and use that input to generate an answer and print it, till the user finally get's bored and exits, by leaving the input empty.

Using tools

As mentioned the ai, can also utilize tools to enrich the conversation. To give it this ability we first have to define the problem we want to solve and then create as few functions as possible, with the highest possible simplicitly to achieve our goal. Let's say we want our AI to take Notes. In order for the AI to be able to do this we will create a function with a descriptive name such as "take_note". We then think of a short description and add that in docstring, together with specifying the input. Even though python isn't typesafe it's still good to specify the types, to give the llm hints on what types to use.


def take_note(title:str, content:str) -> str:
	"""
	take notes, to remember importnt things.
	
	title:str
	the title of your message

	content:str
	the content of your message

	"""
	notes.append(f"# {title}\n{content}")
	
	history.append({
			'role': 'system',
			'content': 'keep your messages short and concise\n\nYou took the following notes:\n'+'\n- '.join(notes)
		})
	
	return "note taken successfully!"

Similarly we create the function "multiply", because llms are inherently bad at calculating.


def multiply(a:float, b:float) -> float:
	"""
	multiply to floats a and get the result
	
	a:float
	the first float

	b:float
	the second float

	"""
	
	return a*b

We then specify the tools we have with the "tools" argument of the chat API.


response = chat(model='gpt-oss:20b', messages=history, tools=tools)

For simplicity and for increasing the ease of making changes later on, we will store all tools in a dictionary called tools:


tools = {
	"take_note":take_note,
	"multiply":multiply
}

But that's not all. As I said in the beginning, the AI can now specify tools, but Ollama doesn't call them for us. We need to call the function ourselves, and give the result as a message with the role "tool". We then proceed generating till the AI doesn't call any tools anymore.


if response.message.tool_calls:# check if the AI called a tool
	for call in response.message.tool_calls:
		result = 'Unknown tool'# fallback value if the tool does not exist
		
		for tool in tools.keys():
			if call.function.name == tool:
				result = tools[tool](**call.function.arguments)#call the tool with the specified arguments
		
		history.append({# add the result to the history
			'role': 'tool',
			'tool_name': call.function.name,
			'content': str(result)
		})

We put the above in a loop. The moment, the AI doesn't call any tools, we exit. Therefore, the full function calling code looks like this:


from ollama import chat
from ollama import ChatResponse



history = []
notes = []

def take_note(title:str, content:str) -> str:
	"""
	take notes, to remember importnt things.
	
	title:str
	the title of your message

	content:str
	the content of your message

	"""
	notes.append(f"# {title}\n{content}")
	
	history.append({
			'role': 'system',
			'content': 'keep your messages short and concise\n\nYou took the following notes:\n'+'\n- '.join(notes)
		})
	
	return "note taken successfully!"

def multiply(a:float, b:float) -> float:
	"""
	multiply to floats a and get the result
	
	a:float
	the first float

	b:float
	the second float

	"""
	
	return a*b
	
history.append({
			'role': 'system',
			'content': 'keep your messages short and concise. Use tools whenever necessary.'
		})


user_msg = input('Chat:')

tools = {
	"take_note":take_note,
	"multiply":multiply
}

while user_msg != '':
	history.append({
			'role': 'user',
			'content': user_msg
		})
	
	while True:#while the AI calls tools
		response = chat(model='gpt-oss:20b', messages=history, tools=tools.values())
		
		history.append({
				'role':response['message']['role'],
				'content':response['message']['content']
			})#do not add thinking to history
		
		if response.message.tool_calls:# if the AI called a tool
			
			for call in response.message.tool_calls:
				result = 'Unknown tool'# fallback value if the tool does not exist
				
				for tool in tools.keys():
					
					if call.function.name == tool:
						print("used tool",tool)
						result = tools[tool](**call.function.arguments)#call the tool with the specified arguments
				
				history.append({# add the result to the history
					'role': 'tool',
					'tool_name': call.function.name,
					'content': str(result)
				})
		else:#if the AI finished without a tool call end the generation loop
			break
	
	print(response['message']['thinking'])
	print('-----------------------------')
	print(response['message']['content'])
	
	user_msg = input('Chat:')

We also changed the system prompt to make the AI use the provided tools.

Summary

So now, you can use ollama, to create your own local, agentic AI models. If you liked this tutorial, feel free to check by from time to time to check out our other tutorials.

Inference of local LLMs and tool calling with Ollama

What is Tool-calling?

Requirements for a web agent tool

A Simple Chat

Using tools

Summary

Contact and Terms of Services