Rise of the Planet of the Agents 🤖: Creating an LLM-Based AI Agent from Scratch!

Rise of the Planet of the Agents 🤖: Creating an LLM-Based AI Agent from Scratch!
The Cyb3rWard0g

Ever since I began learning about Generative Artificial Intelligence (GenAI), I've been eager to explore beyond just building AI chatbots for simple question-answering over documents. This curiosity led me to experiment with various LLMs, and through this process, I learned about their potential not only for recalling information but also for reasoning in natural language and enabling autonomous decision-making. This led me into the world of AI autonomous agents. Despite AMAZING libraries like LangChain or LlamaIndex simplifying their creation, I wanted to understand the internals. We all learn in different ways, and for me, that means writing things from scratch. So, I knew I had to write my first agent from scratch, learning step by step through the process.

In this blog post, I share my journey writing my own agent in Python and explore the fundamental concepts behind LLM-based AI autonomous agents. If you're new to the world of GenAI, I highly recommend reading my other post, "Demystifying Generative AI 🤖 A Security Researcher's Notes", to familiarize yourself with the basic terminology and concepts for interacting with LLMs.

What is an Agent? 🐶

In the simplest terms, an agent is anything that can act on its own or make decisions. This could be a person, an animal, or any entity that has the autonomy to choose and act independently.

What is an AI Agent? 🤖

An AI agent is a computer program or machine that can act on its own, making decisions similar to how humans do. It uses data and AI to think through tasks and decides the best action to take. In addition, it can use tools to execute actions and gather new information, enhancing its decision-making process.

What is an LLM-Based AI Agent? 🤖➕🧠

An LLM-Based AI Agent uses an LLM as a reasoning engine to think through complex tasks in natural language. Its actions are not hardcoded; instead, it determines its own steps based on the user's input and past interactions. This agent combines language-based planning with actions to interact with the outside world in a non-deterministic way, much like how humans operate.

Figure 1: A diagram showing a basic representation of an LLM-based Agent.

This is not a new concept; from January 2021 to August 2023, there has been a noticeable increase in research papers on this topic.

Figure 2: Illustration of the growth trend in the field of LLM-based autonomous agents. (Image Source: Wang et al. 2023).

A Reasoning Engine? 🎮

Reasoning is the process of thinking through a problem or situation logically. It's how we figure things out, whether it is solving a puzzle or deciding what to do in a particular situation. Reasoning involves analyzing information, evaluating options, and choosing the best course of action based on that analysis.

In an LLM-based AI agent, this ability to reason can be activated by prompts such as Let's think step by step" or by providing examples of intermediate reasoning steps. This approach enables the agent to break down a task into smaller parts and methodically work through each one before coming up with the best answer.

Figure 3: An illustration showing how reasoning can be triggered by basic instructions such as "Let's think step by step" and improve an LLM response. (Source: Kojima et al. 2023).

Is Reasoning Enough? 🤔

After reading about LLMs and their ability to reason, I got curious about how an agent would interact with the real world. How would an agent validate its reasoning or learn new things from its environment? Can it reason reactively or update its knowledge? An Agent needs to do more than just think; it needs to act.

This integration of thought and action is essential for AI agents to interact more effectively with their environments. It's not just about processing information but also about implementing decisions based on reasoning.

Figure 4: An illustration showing that a question may require external information beyond the LLM's parametric knowledge to be answered effectively. (source: Yao et al. 2023).

Anatomy of an LLM-based AI Agent 🤖

After exploring the fundamental concepts of agent reasoning and action, I wanted to outline the key components essential for building a functional LLM-based AI agent. These components form the backbone of how the agent processes information, makes decisions, and interacts with its environment.

Figure 5: An illustration showing the key components for building an LLM-based AI Agent.
  • LLM Core
    • Produces coherent, relevant text.
    • Facilitates complex reasoning and problem-solving.
    • Integrates pre-trained knowledge to inform decisions.
  • Persona
    • Assigns specific roles to guide the agent's behavior.
    • Defines personality traits and instructions.
    • Additional attributes shaping agent's identity.
  • Memory Management
    • Maintains relevant context across interactions.
    • Customizes responses based on user preferences.
    • Leverages past interactions for better responses.
  • Reasoning and Decision-Making
    • Formulates step-by-step strategies to achieve goals.
    • Simplifies complex tasks into sub-tasks.
    • Adapts to current environment.
    • Evaluates past actions to refine future strategies.
  • Action Execution
    • Creates specific actions based on decisions and plans.
    • Employs external tools to enhance action effectiveness.
    • Interacts with environment to execute tasks and get feedback.

Designing a Functional LLM-based AI Agent 🤖

After documenting the key components of an AI agent, it was time to write some code. For my initial basic design, I needed a client to communicate with LLMs, a system to store and manage memory, tools and mechanisms for execution, and a way to define plans and personas.

Figure 6: An illustration showing some applications enabling the agent components.
  • Generator: A client to interact with LLMs, primarily for generating text. It validates user input and processes structured responses.
  • Memory: Enables both short-term and long-term memory management. This system can range from simple lists to advanced vector or graph databases, storing and retrieving interaction history as needed.
  • Tool Formatter: Uses a decorator to format generic Python functions into a structured format that aligns with the AI agent's requirements for tool invocation. Remember, LLMs do not invoke tools, agents do.
  • Tool Executor: Manages tool registration, maintains a catalog, and ensures tools are available and executed correctly.
  • Prompt Formatter: Prepares prompt templates with chat history, tool details, and execution logs to ensures all essential information is included for effective reasoning and decision-making.
  • AI Agent: Integrates all components, orchestrates their functions, and manages the agent's workflow. It initializes the agent, processes prompt templates, and uses the tool executor to execute actions.

The Agentic Journey Begins! 🧭

Let's now dive into the journey of creating an AI agent from scratch, exploring key components, challenges, and fundamental concepts along the way.

Figure 7: An image showing a dog robot with its human creator.

The Agent Base Class 🤖

The first component I defined was the Agent class itself. This foundational class was designed to be flexible and extensible, serving as a template for integrating more complex features as development progressed.

class Agent:
    """
    Integrates LLM client, tools, and memory.
    """
    def __init__(self, llm_client=None, tools=None, memory=None):
        self.llm_client = llm_client
        self.tools = tools
        self.memory = memory

    def run(self):
        """
        Placeholder for the agent's main execution logic.
        """
        pass

Building the LLM Client 📡

Next, I created a class to interact with LLMs, primarily from OpenAI, due to their models' performance. I'll cover local LLMs in a future post.

💡
This component requires an OpenAI API key, and the OpenAI Python library for convenient access to the OpenAI REST API.

The OpenAI Chat Completion Class

This class interacts with the OpenAI Chat completion API, sending messages and parsing responses to handle various message types effectively.

from typing import Dict, Any, List
import openai

class OpenAIChatCompletion:
    """
    Interacts with OpenAI's API for chat completions.
    """
    def __init__(self, model: str, api_key: str = None, base_url: str = None):
        """
        Initialize with model, API key, and base URL.
        """
        self.client = openai.OpenAI(api_key=api_key, base_url=base_url)
        self.model = model

    def generate(self, messages: List[Dict], **kwargs) -> Dict[str, Any]:
        """
        Generate a response from input messages.
        """
        params = {'messages': messages, 'model': self.model, **kwargs}
        response = self.client.chat.completions.create(**params)
        return response.choices[0].message

🚦Assembly Line: Integrating LLM Client with AI Agent 🤖

This enables the agent to leverage the LLM client for generating responses.

class Agent:
    """
    Integrates LLM client, tools, and memory.
    """
    def __init__(self, llm_client, tools=None, memory=None):
        self.llm_client = llm_client
        self.tools = tools
        self.memory = memory

    def run(self, messages: List[Dict[str, str]]):
        """
        Generates a response from the LLM client.
        """
        # Generate response using the LLM client
        response = self.llm_client.generate(messages)
        return response

Testing the Updated Agent Class 🔬

To securely handle the API key, set an OPENAI_API_KEY variable in a .env file and use Python-dotenv to load it as an environment variable. The OpenAI library will automatically detect and use this variable.

from dotenv import load_dotenv
load_dotenv()  # Load environment variables from .env.

Next, initialize the new class, define a few messages, and run the agent:

# Initialize the OpenAI client
client = OpenAIChatCompletion(base_url='https://api.openai.com/v1', model='gpt-4')

# Define a set of messages
messages = [
    {"role": "system", "content": "You are a security assistant."},
    {"role": "user", "content": "Hey! This is Roberto!"}
]

# Initialize the agent with the LLM client
myAgent = Agent(llm_client=client)

# Use the agent to run a task
response = myAgent.run(messages=messages)
print(response)
ChatCompletionMessage(content='Hello Roberto! How can I assist you today regarding security matters?', role='assistant', function_call=None, tool_calls=None)

Implementing Short-Term Memory 💾

After setting up the LLM client, I focused on enabling the agent to recall previous interactions. Short-term memory handles recent conversations within the LLM's context window, while long-term memory uses databases like vector stores or graphs for retrieving extensive chat history, ensuring relevant context.

Figure 8: An illustration showing a basic short-term memory flow.

The Short-Term Memory Class

This class manages a list of chat messages and access to recent interactions.

from typing import List, Dict

class ChatMessageMemory:
    """Manages conversation context."""
    
    def __init__(self):
        self.messages = []
    
    def add_message(self, message: Dict):
        """Add a message to memory."""
        self.messages.append(message)
    
    def add_messages(self, messages: List[Dict]):
        """Add multiple messages to memory."""
        for message in messages:
            self.add_message(message)
    
    def add_conversation(self, user_message: Dict, assistant_message: Dict):
        """Add a user-assistant conversation."""
        self.add_messages([user_message, assistant_message])
    
    def get_messages(self) -> List[Dict]:
        """Retrieve all messages."""
        return self.messages.copy()
    
    def reset_memory(self):
        """Clear all messages."""
        self.messages = []

🚦Assembly Line: Integrating Memory with AI Agent 🤖

With the LLM client and memory class ready, the next step was to integrate memory into the Agent class. The agent initializes with the system message, and user messages are added to memory before merging. This allows efficient generation, storage, and retrieval of conversation context.

class Agent:
    """Integrates LLM client, tools, and memory."""
    
    def __init__(self, llm_client, system_message: Dict[str, str], tools=None):
        self.llm_client = llm_client
        self.tools = tools
        self.memory = ChatMessageMemory()
        self.system_message = system_message

    def run(self, user_message: Dict[str, str]):
        """Generate a response using LLM client and store context."""
        self.memory.add_message(user_message)
        chat_history = [self.system_message] + self.memory.get_messages()
        response = self.llm_client.generate(chat_history)
        self.memory.add_message(response)
        return response

Testing the Updated Agent Class 🔬

# Test the updated agent
client = OpenAIChatCompletion(base_url='https://api.openai.com/v1', model='gpt-4')

# Define the system message
system_message = {"role": "system", "content": "You are a security assistant."}

# Initialize the Agent with the LLM client and system message
agent = Agent(llm_client=client, system_message=system_message)

# Define a user message
user_message = {"role": "user", "content": "Hey! This is Roberto!"}

# Generate a response using the agent
response = agent.run(user_message)

# Add a follow-up message to test memory integration
follow_up_message = {"role": "user", "content": "What was my name?"}

# Run the agent again with the new message
response = agent.run(follow_up_message)
print(response)
ChatCompletionMessage(content='Your name is Roberto.', role='assistant', function_call=None, tool_calls=None)

Enabling Tool Execution v1 🛠️

After setting up LLM interaction and memory integration, the next step was enabling tool use. The goal was for the LLM to help the agent choose the right tools based on the task and provide the correct inputs for each tool. This ensures the agent selects and executes tools effectively.

Selecting the Right Tool Via OpenAI Function Calling Capabilities 💫

One reliable method for choosing tools and returning structured outputs is OpenAI's Function Calling. Introduced on June 13, 2023, this feature allows developers to describe functions to models trained to generate a JSON object with the necessary arguments based on user input.

💡
The "Function Calling" feature does NOT execute tools. It only provides the necessary information to an agent to execute the tool.

Let's see how this works:

Figure 9: An illustration demonstrating a basic Open AI function calling flow.
  1. The user submits a query specifying a task and the available tools.
  2. The LLM selects the right tool for the task, supplying a unique ID, tool name, and tool arguments as a JSON string for the agent to execute.
  3. The AI agent parses the JSON string, executes the tool with the provided arguments, and submits the results back as a Tool message.
  4. Finally, the LLM summarizes the tool execution results within the user's context delivering a comprehensive final response.
💡
Steps 2-3 can be repeated multiple times depending on the task 🔥.

Testing OpenAI's Function Calling Capability 🔍

1️⃣ Update Open AI Chat Completion Client

First, I needed to update the client to accept an additional tools parameter, allowing users to pass the tool details along with the user message.

from typing import Dict, Any, List
import openai

class OpenAIChatCompletion:
    """Interacts with OpenAI's API for chat completions."""
    def __init__(self, model: str, api_key: str = None, base_url: str = None):
        self.client = openai.OpenAI(api_key=api_key, base_url=base_url)
        self.model = model

    def generate(self, messages: List[str], tools: List[Dict[str, Any]] = None, **kwargs) -> Dict[str, Any]:
        """Generates a response from OpenAI's API."""
        params = {'messages': messages, 'model': self.model, 'tools': tools, **kwargs}
        response = self.client.chat.completions.create(**params)
        return response.choices[0].message

2️⃣ Update Agent Class to Integrate Tools

Next, I updated the Agent class to pass tools to the LLM client.

from typing import Dict

class Agent:
    """Integrates LLM client, tools, and memory."""
    def __init__(self, llm_client, system_message: Dict[str, str], tools=None):
        self.llm_client = llm_client
        self.tools = tools
        self.memory = ChatMessageMemory()
        self.system_message = system_message

    def run(self, user_message: Dict[str, str]):
        self.memory.add_message(user_message)
        chat_history = [self.system_message] + self.memory.get_messages()
        response = self.llm_client.generate(chat_history, tools=self.tools)
        self.memory.add_message(response)
        return response

3️⃣ Define Dummy Python Functions

I defined a few dummy Python functions to test these capabilities.

def get_weather(location: str) -> str:
    """Gets weather information."""
    return f"{location}: 80F."

def jump(distance: str) -> str:
    """Jumps a specific distance."""
    return f"I jumped the following distance {distance}"

4️⃣ Define the Python Function Signature as OpenAI Function Calls

I then defined a dictionary for each tool following the expected schema.

get_weather_func_dict = {
    'type': 'function',
    'function': {
        'name': 'get_weather',
        'description': 'Get weather information based on location.',
        'parameters': {
            'properties': {'location': {'type': 'string'}},
            'required': ['location'],
            'type': 'object'
        }
    }
}

jump_func_dict = {
    'type': 'function',
    'function': {
        'name': 'jump',
        'description': 'Jump a specific distance.',
        'parameters': {
            'properties': {'distance': {'type': 'string'}},
            'required': ['distance'],
            'type': 'object'
        }
    }
}

5️⃣ Initialize Agent and Test It.

I then initialized the client to generate a response from the LLM, including tools.

client = OpenAIChatCompletion(base_url='https://api.openai.com/v1', model='gpt-4')

# Define the system message
system_message = {"role": "system", "content": "You are a helpful assistant."}

# Gather all available tools
tools = [get_weather_func_dict, jump_func_dict]

# Initialize the Agent with the LLM client, system message, and tools
agent = Agent(llm_client=client, system_message=system_message, tools=tools)

# Define a user message
user_message = {"role": "user", "content": "What is the weather in Virginia?"}

# Generate a response using the agent
response = agent.run(user_message)
response
ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_HFyUnaAmRc9trG4HdBwdjg7v', function=Function(arguments='{\n "location": "Virginia"\n}', name='get_weather'), type='function')])

6️⃣ Capture Assistant's Response and Extract Tool Details

The LLM returned a message with the tool details and suggested arguments.

import json

tool_response = response.tool_calls[0]
tool_arguments = json.loads(tool_response.function.arguments)
tool_arguments
{'location': 'Virginia'}

7️⃣ Execute the Tool with the Function Arguments

tool_execution_results = get_weather(**tool_arguments)
tool_execution_results
'Virginia: 80F.'

8️⃣ Craft a Tool Message and Send It to the LLM for a final response

tool_message = {
    "role": "tool",
    "tool_call_id": tool_response.id,
    "name": tool_response.function.name,
    "content": str(tool_execution_results)
}

final_response = agent.run(tool_message)
final_response
ChatCompletionMessage(content='The current weather in Virginia is 80°F.', role='assistant', function_call=None, tool_calls=None)

Awesome, right? But how do we ensure the tool arguments are valid? 🤔

Enter Pydantic: Validating Tool Arguments 🔍

Pydantic is a powerful library that leverages Python type hints for efficient data validation and serialization. Using Pydantic models, we can define and validate tool arguments, ensuring correct data types and structure.

Define Tool Argument Schema as a Pydantic Model

First, I defined the tool's argument schema using Pydantic to validate the suggested tool arguments. For example:

# Python function
def get_weather(location: str) -> str:
    """Get weather information based on location."""
    return f"{location}: 80F."

# Pydantic Model
from pydantic import BaseModel, Field

class GetWeatherSchema(BaseModel):
    """Get weather information based on location."""
    location: str = Field(description="Location to get weather for")

Validate Suggested Tool Argument Schema

To validate the suggested tool argument schema, follow the previous steps up to initializing the agent and generating a response.

6️⃣ Capture Assistant's Response and Extract Tool Details

The LLM returned a message with the tool details and suggested arguments.

# Extract tool arguments (JSON string)
tool_response = response.tool_calls[0]
tool_arguments = tool_response.function.arguments

Instead of manually converting the JSON string, validate it directly with Pydantic's model_validate_json() utility.

response_model = GetWeatherSchema.model_validate_json(tool_arguments)
response_model
GetWeatherSchema(location='Virginia')

You can also generate JSON schemas from Pydantic models using Pydantic's model_json_schema() utility.

GetWeatherSchema.model_json_schema()
{'description': 'Get weather information based on location',
'properties': {'location': {'description': 'location to get weather for',
'title': 'Location',
'type': 'string'}},
'required': ['location'],
'title': 'get_weather_schema',
'type': 'object'}
💡
These JSON schemas are JSON Schema Draft 2020-12 and OpenAPI Specification v3.1.0 compliant which makes it very easy to convert Python functions to OpenAI's Function Calling format 🤯.

Formatting Tools for OpenAI Function Calls with Pydantic Models 🔥

With tool argument validation in place, I moved on to formatting tools for OpenAI's function calling feature using Pydantic models.

Figure 10: An illustration showing a chat completion payload with tools.

The idea is to use the tool's Pydantic model to create its function call format.

Figure 11: An illustration of going from Python function to Pydantic model to function call.

The OpenAI's Function Calling Format

The OpenAI's Function Calling feature requires tool definitions in a specific format, including the function's name, description, and parameters. This format now uses the tools parameter with slight variations.

"tools": [
    {
        "type": "function",
        "function": {
            "name": "function_name",
            "description": "Function description. (optional)",
            "parameters": "The parameters the functions accepts, described as a JSON Schema object. (optional)"
        }
    }
]
💡
Other models with 'Function Calling' capabilities might still use only name, description and parameters when defining tools.

From Pydantic Models to OpenAI Function Calls

I used the previous Pydantic model GetWeatherSchema and created a function to convert the tool's argument schema into OpenAI's Function Calling format.

def to_openai_function_call_definition(name: str, model: BaseModel):
    schema_dict = model.model_json_schema()
    description = schema_dict.pop("description", "")
    schema_dict.pop("title", None)  # Remove the title field to exclude the model name
    return {
        "type": "function",
        "function": {
            "name": name,
            "description": description,
            "parameters": schema_dict
        }
    }

Testing this new capability:

function_call_definition = to_openai_function_call_definition("get_weather", GetWeatherSchema)
function_call_definition
{'type': 'function',
'function': {'name': 'get_weather',
'description': 'Get weather information based on location',
'parameters': {'properties': {'location': {'description': 'location to get weather for',
'title': 'Location',
'type': 'string'}},
'required': ['location'],
'type': 'object'}}}

Streamlining the Agent Tool Concept ⚒️

To streamline the process of converting Python functions to OpenAI's Function Calling format, I created a dedicated class.

The Agent Tool Class 🔧

I developed the AgentTool class to encapsulate a Python function and its Pydantic model. This class includes methods for converting the tool to OpenAI's format and generating a JSON schema of the model.

from pydantic import BaseModel
from typing import Callable, Type

class AgentTool:
    """Encapsulates a Python function with Pydantic validation."""
    def __init__(self, func: Callable, args_model: Type[BaseModel]):
        self.func = func
        self.args_model = args_model
        self.name = func.__name__
        self.description = func.__doc__ or self.args_schema.get('description', '')

    def to_openai_function_call_definition(self) -> dict:
        """Converts the tool to OpenAI Function Calling format."""
        schema_dict = self.args_schema
        description = schema_dict.pop("description", "")
        return {
            "type": "function",
            "function": {
                "name": self.name,
                "description": description,
                "parameters": schema_dict
            }
        }

    @property
    def args_schema(self) -> dict:
        """Returns the tool's function argument schema as a dictionary."""
        schema = self.args_model.model_json_schema()
        schema.pop("title", None)
        return schema

The Tool Decorator ⚙️

To simplify creating AgentTool instances, I developed a Python Decorator. It ensures a function has a docstring and wraps it as an AgentTool with its model.

from typing import Callable, Optional, Type
from pydantic import BaseModel

def check_docstring(func: Callable):
    """Ensure the function has a docstring."""
    if not func.__doc__:
        raise ValueError(f"Function '{func.__name__}' must have a docstring.")

def Tool(func: Optional[Callable] = None, *, args_model: Type[BaseModel]) -> AgentTool:
    """Decorator to wrap a function with an AgentTool instance."""
    def decorator(f: Callable) -> AgentTool:
        check_docstring(f)
        return AgentTool(f, args_model=args_model)
    return decorator(func) if func else decorator

You could then use the Tool decorator to create an AgentTool and convert it to OpenAI's Function Calling format.

# Python function arguments schema
from pydantic import BaseModel, Field

class GetWeatherSchema(BaseModel):
    location: str = Field(description="location to get weather for")

@Tool(args_model=GetWeatherSchema)
def get_weather(location: str) -> str:
    """Get weather information based on location."""
    return f"{location}: 80F."

# Convert the AgentTool to OpenAI's Function Calling format
get_weather.to_openai_function_call_definition()
{'type': 'function',
'function': {'name': 'get_weather',
'description': '',
'parameters': {'properties': {'location': {'description': 'location to get weather for',
'title': 'Location',
'type': 'string'}},
'required': ['location'],
'type': 'object'}}}

Validating Tool Arguments before Agent Tool Execution

To ensure tool arguments were validated before execution, I added functionality to the AgentTool class. This involved creating a method to validate JSON strings using Pydantic's model_validate_json() utility and modifying the call function to validate parameters before executing the tool.

Here are the new methods added to the AgentTool class:

from pydantic import BaseModel, ValidationError
from typing import Callable, Type
from inspect import signature

class AgentTool:
    # Existing code...

    def validate_json_args(self, json_string: str) -> bool:
        """Validate JSON string using the Pydantic model."""
        try:
            validated_args = self.args_model.model_validate_json(json_string)
            return isinstance(validated_args, self.args_model)
        except ValidationError:
            return False

    def run(self, *args, **kwargs) -> Any:
        """Execute the function with validated arguments."""
        try:
            # Handle positional arguments by converting them to keyword arguments
            if args:
                sig = signature(self.func)
                arg_names = list(sig.parameters.keys())
                kwargs.update(dict(zip(arg_names, args)))

            # Validate arguments with the provided Pydantic schema
            validated_args = self.args_model(**kwargs)
            return self.func(**validated_args.model_dump())
        except ValidationError as e:
            raise ValueError(f"Argument validation failed for tool '{self.name}': {str(e)}")
        except Exception as e:
            raise ValueError(f"An error occurred during the execution of tool '{self.name}': {str(e)}")

    def __call__(self, *args, **kwargs) -> Any:
        """Allow the AgentTool instance to be called like a regular function."""
        return self.run(*args, **kwargs)

With the new methods in the AgentTool class, you can now validate tool arguments and execute the function with validated parameters. Similar to the previous example, initialize an AgentTool with the Tool decorator. Then, validate a dummy tool argument JSON string and run the tool:

tool_json_arguments = '{\n  "location": "Virginia"\n}'

# Validate the JSON string arguments
get_weather.validate_json_args(tool_json_arguments)  # True

# Convert JSON string to dictionary and run the tool
tool_arguments_dict = json.loads(tool_json_arguments)
get_weather.run(**tool_arguments_dict)
'Virginia: 80F.'

One Final Recap! 😅 Just to Clarify! 📝

With these updates, I defined a Pydantic model for function arguments, used the Tool decorator to create an AgentTool, converted it to OpenAI's Function Calling format, and sent it along with a user message to the LLM.

Figure 12: An illustration to show the flow to go from function to Agent Tool to Function Call.

When the LLM suggests tool calls, we validate the arguments with AgentTool's validate_json_args method. Once validated, we run the tool.

Figure 13: An illustration showing the validation and execution of tools by an agent.

Operationalizing Tool Execution ⚒️ -> ⚡️

After setting up our agent to select and prepare tools with the help of LLMs, the next step was to manage these tools effectively. It was essential to have a system that could handle tool availability and execution seamlessly.

The Agent Tool Executor Class

I created an AgentToolExecutor class to manage the agent's interaction with tools. This class ensures tools are registered, managed, and executed properly, bridging the gap between choosing the right tool and executing it effectively.

from typing import Any, Dict, List, Optional

class AgentToolExecutor:
    """Manages tool registration and execution."""
    
    def __init__(self, tools: Optional[List[AgentTool]] = None):
        self.tools: Dict[str, AgentTool] = {}
        if tools:
            for tool in tools:
                self.register_tool(tool)
    
    def register_tool(self, tool: AgentTool):
        """Registers a tool."""
        if tool.name in self.tools:
            raise ValueError(f"Tool '{tool.name}' is already registered.")
        self.tools[tool.name] = tool
      
    def execute(self, tool_name: str, *args, **kwargs) -> Any:
        """Executes a tool by name with given arguments."""
        tool = self.tools.get(tool_name)
        if not tool:
            raise ValueError(f"Tool '{tool_name}' not found.")
        try:
            return tool(*args, **kwargs)
        except Exception as e:
            raise ValueError(f"Error executing tool '{tool_name}': {e}") from e
    
    def get_tool_names(self) -> List[str]:
        """Returns a list of all registered tool names."""
        return list(self.tools.keys())
    
    def get_tool_details(self) -> str:
        """Returns details of all registered tools."""
        tools_info = [f"{tool.name}: {tool.description} Args schema: {tool.args_schema['properties']}" for tool in self.tools.values()]
        return '\n'.join(tools_info)

🚦Assembly Line: Integrating Tool Executor with AI Agent 🤖

With the main components nearly complete, I refined the Agent base class by registering tools as AgentTool objects, converting them to OpenAI's Function Calling format, and managing tool calls using the AgentToolExecutor. Each tool execution was recorded in a tool_history array and relayed back to the LLM, ensuring a seamless flow of interaction.

I introduced a parse_response method to handle LLM responses suggesting tool executions. This method checks for tools, executes them, and updates the tool history for accurate context. Here's how I implemented these capabilities:

import logging
from typing import Dict, List, Optional

logger = logging.getLogger(__name__)

class Agent:
    """Integrates LLM client, tools, memory, and manages tool executions."""
    
    def __init__(self, llm_client, system_message: Dict[str, str], max_iterations: int = 10, tools: Optional[List[AgentTool]] = None):
        self.llm_client = llm_client
        self.executor = AgentToolExecutor()
        self.memory = ChatMessageMemory()
        self.system_message = system_message
        self.max_iterations = max_iterations
        self.tool_history = []
        self.function_calls = None
        
        # Register and convert tools
        if tools:
            for tool in tools:
                self.executor.register_tool(tool)
            self.function_calls = [tool.to_openai_function_call_definition() for tool in tools]

    def run(self, user_message: Dict[str, str]):
        """Generates responses, manages tool calls, and updates memory."""
        self.memory.add_message(user_message)

        for _ in range(self.max_iterations):
            chat_history = [self.system_message] + self.memory.get_messages() + self.tool_history
            response = self.llm_client.generate(chat_history, tools=self.function_calls)

            if self.parse_response(response):
                continue
            else:
                self.memory.add_message(response)
                self.tool_history = []
                return response

    def parse_response(self, response) -> bool:
        """Executes tool calls suggested by the LLM and updates tool history."""
        import json
        
        if response.tool_calls:
            self.tool_history.append(response)
            for tool in response.tool_calls:
                tool_name = tool.function.name
                tool_args = tool.function.arguments
                tool_args_dict = json.loads(tool_args)
                try:
                    logger.info(f"Executing {tool_name} with args: {tool_args}")
                    execution_results = self.executor.execute(tool_name, **tool_args_dict)
                    self.tool_history.append({
                        "role": "tool",
                        "tool_call_id": tool.id,
                        "name": tool_name,
                        "content": str(execution_results)
                    })
                except Exception as e:
                    raise ValueError(f"Execution error in tool '{tool_name}': {e}") from e
            return True
        return False

To test the new Agent class, follow the steps in previous examples. Initialize the client, define tools with the Tool decorator, initialize the agent with those tools and the LLM client, and finally, run the agent with your query.

# Set Environment Variables
from dotenv import load_dotenv
load_dotenv()  # take environment variables from .env.

# Initialize LLM Client
client = OpenAIChatCompletion(base_url='https://api.openai.com/v1', model='gpt-4')

# Initialize Agent with tools and LLM client
# Define Agent Tools (jump and get_weather) with their respective Pydantic modelsdescribing their argument schema.
...
.....

# Define the system message
system_message = {"role": "system", "content": "You are a helpful assistant."}

# Initialize the Agent with the LLM client, system message, and tools
agent = Agent(llm_client=client, system_message=system_message, tools=tools)

# Run Task
# Define a user message
user_message = {"role": "user", "content": "What is the weather in Virginia?"}

# Generate a response using the agent
response = agent.run(user_message)
response
ChatCompletionMessage(content='The weather in Virginia is currently 80°F.', role='assistant', function_call=None, tool_calls=None)

Would it Handle Multiple Tool Executions? 😎

You can adjust the user message to ask a question that will require a tool to run multiple times. You can enable logging to see tools being executed.

# Enable Logging
import logging
logging.basicConfig(level=logging.INFO)

# Reset previous interactions
agent.memory.reset_memory()

# Define a new task
user_message = {"role": "user", "content": "What is the weather in Virginia, Washington and New York?"}

# Generate a response using the agent
response = agent.run(user_message)
response

INFO:__main__:Executing get_weather with args: {"location": "Virginia"}
INFO:__main__:Executing get_weather with args: {"location": "Washington"}
INFO:__main__:Executing get_weather with args: {"location": "New York"}

ChatCompletionMessage(content='The current weather is:\n\n- Virginia: 80°F\n- Washington: 80°F\n- New York: 80°F', role='assistant', function_call=None, tool_calls=None)

Even though the dummy function always returns the same result, the get_weather tool was executed 3 times. Up to this point, I had a basic functional LLM-based AI agent using Function Calling capabilities.

The following notebook contains all the code to build this agent, which I will modify in the next sections:

GenAI-Security-Adventures/experiments/Agents/OAI-Blog-Basic-ToolCalling-Agent-Scratch.ipynb at main · OTRF/GenAI-Security-Adventures
Contribute to OTRF/GenAI-Security-Adventures development by creating an account on GitHub.

What Models Support Function Calling? 🧠

What About Models That Do Not Support Function Calling? 🗣️ 🧠

For models without Function Calling, we use prompt engineering to add decision-making logic directly into the prompts. This guides the LLM to choose the right tools based on the prompts and user's intent.

Selecting the Right Tool via Prompting 🗣️

To make the LLM select the right tool, include context about tools and their argument schemas in the prompt. Then, instruct the LLM to return the name of the chosen tool along with the suggested tool arguments.

🗣️ "You have access to the following tools" -> 🧠

Figure 14: An illustration showing a basic example of tool selection by an LLM.

🗣️ "Use a JSON Blob to Specify a Tool" -> 🧠

Instruct the LLM to return tool details in formats like JSON, XML, or YAML. For example, requesting a JSON blob helps the LLM structure its response as a JSON object, which the agent can then parse and validate against a predefined schema.

Figure 15: An illustration showing an example of tool selection and structured response.

You can test this with our current agent by sending a prompt that includes tool details and an instruction to respond with a JSON object. No AgentTool.

# Setup client with environment variables
client = OpenAIChatCompletion(base_url='https://api.openai.com/v1', model='gpt-4')

# Define the system message
system_message = {"role": "system", "content": "You are a helpful assistant."}

# Initialize the Agent with the LLM client and system message
agent = Agent(llm_client=client, system_message=system_message)

USER_PROMPT = """
You have access to the following tools:

search(topic: str) - Use it to search information.
greet(person: str) - Use it to greet a person.

Respond to the user with the right tool and input whenever is needed.
When responding to the user, provide only ONE tool per $JSON_BLOB, as shown in the example below delimited by triple backticks:

```
{{
    "name": $TOOL_NAME,
    "arguments": $INPUT
}}
```

User input: Hello, this is roberto!
"""

# Define a user message
user_message = {"role": "user", "content": USER_PROMPT}

# Generate a response using the agent
response = agent.run(user_message)
response
ChatCompletionMessage(content='{\n "name": "greet",\n "arguments": "Roberto"\n}', role='assistant', function_call=None, tool_calls=None)

After receiving the response, pray 😅 and parse the message content:

import json

json.loads(response.content)
{'name': 'greet', 'arguments': 'roberto'}

What about JSON Within Text? 🔍

Instructing an LLM to output JSON can result in mixed explanatory text and structured responses. When this happens, we can use regex patterns to extract the JSON embedded within the text.

For parsing complex nested JSON structures, Python's re module isn't sufficient since it lacks nested pattern matching support. The regex library, however, allows recursion with patterns like r'\{(?:[^{}]|(?R))*\}', enabling effective parsing of deeply nested JSON data. (https://stackoverflow.com/a/54235803).

import regex
import json

def parse_nested_json(text):
    # Unescape backslashes
    text = text.replace('\\\\n', '\\n').replace('\\n', '\n').replace('\\\'', '\'').replace('\\\\', '\\')
    # Replace double curly braces with single curly braces
    text = text.replace('{{', '{').replace('}}', '}')
    pattern = regex.compile(r'\{(?:[^{}]|(?R))*\}') # Supports nested structures
    match = pattern.search(text)
    if match:
        try:
            return json.loads(match.group())
        except json.JSONDecodeError:  # Corrected to use json.JSONDecodeError
            pass
    return None  # No valid JSON found or parsing error

Implementing a Prompt Formatter 🗣️

It became clear that I needed a way to dynamically craft prompt templates that could incorporate elements like tool details, user input, and even chat history. To achieve this, I created the StringPromptTemplate class.

The String Prompt Template Class

import re

class StringPromptTemplate:
    """Handles dynamic prompt formatting for an AI agent."""

    def __init__(self, template: str):
        """Initializes the prompt template and extracts variables."""
        self.template = template
        self.variables = {}
        self.required_variables = self.extract_variables()

    def extract_variables(self):
        """Extracts placeholders from the template."""
        return set(re.findall(r'\{(.*?)\}', self.template))

    def update_variables(self, **kwargs):
        """Updates template variables."""
        self.variables.update(kwargs)
        self.required_variables -= set(kwargs.keys())

    def format_prompt(self, **kwargs):
        """Generates a formatted prompt and tracks remaining variables."""
        combined_variables = {**self.variables, **kwargs}
        self.required_variables -= set(kwargs.keys())
        return self.template.format(**combined_variables)

Test the new class by updating the USER_PROMPT string, replacing hardcoded tools with a {tool_details} placeholder and initializing an instance.

USER_PROMPT = """
You have access to the following tools:

{tool_details}

Respond to the user with the right tool and input whenever is needed.
When responding to the user, provide only ONE tool per $JSON_BLOB, as shown in the example below delimited by triple backticks:

```
{{
    "name": $TOOL_NAME,
    "arguments": $INPUT
}}
```

User input: What is the weather in Virginia?
"""
formatter = StringPromptTemplate(USER_PROMPT)

Then, define previous tools with their respective Pydantic models.

# Python function arguments schema
from pydantic import BaseModel, Field

class GetWeatherSchema(BaseModel):
    location: str = Field(description="location to get weather for")

@Tool(args_model=GetWeatherSchema)
def get_weather(location: str) -> str:
    """Get weather information based on location."""
    return f"{location}: 80F."

class JumpSchema(BaseModel):
    distance: str = Field(description="Distance for agent to jump")

@Tool(args_model=JumpSchema)
def jump(distance: str) -> str:
    """Jump a specific distance."""
    return f"I jumped the following distance {distance}"

tools = [get_weather,jump]

Get tool details and replace the tool_details placeholder.

# Get tool details
tools_info = [f"{tool.name}: {tool.description} Args schema: {tool.args_schema['properties']}" for tool in tools]
tool_details = '\n'.join(tools_info)

user_prompt_formatted = formatter.format_prompt(tool_details=tool_details)
print(user_prompt_formatted)
You have access to the following tools:

get_weather: Get weather information based on location. Args schema: {'location': {'description': 'location to get weather for', 'title': 'Location', 'type': 'string'}}
jump: Jump a specific distance. Args schema: {'distance': {'description': 'Distance for agent to jump', 'title': 'Distance', 'type': 'string'}}

Respond to the user with the right tool and input whenever is needed...

🚦Assembly Line: Integrating Prompt Formatter with AI Agent

With control over the prompt, I needed to adapt the agent class to choose the right tools based on the prompt rather than the OpenAI's Function Calling feature. The new class integrates the prompt formatter, dynamically processing the prompt template each time the agent runs.

import logging
from typing import Dict, List, Optional

logger = logging.getLogger(__name__)

class Agent:
    """Integrates key components and manages tool executions."""
    
    def __init__(self, llm_client, system_message: Dict[str, str], max_iterations: int = 10, tools: Optional[List[AgentTool]] = None, prompt_template: StringPromptTemplate = None):
        self.llm_client = llm_client
        self.executor = AgentToolExecutor()
        self.memory = ChatMessageMemory()
        self.system_message = system_message
        self.max_iterations = max_iterations
        self.tool_history = []
        self.function_calls = None
        self.prompt_template = prompt_template

        if tools:
            for tool in tools:
                self.executor.register_tool(tool)
            self.function_calls = [tool.to_openai_function_call_definition() for tool in tools]

        tool_details = self.executor.get_tool_details()
        tool_names = ' or '.join(self.executor.get_tool_names())
        self.prompt_template.update_variables(
            system_message=self.system_message,
            tool_details=tool_details,
            tool_names=tool_names
        )

    def run(self, task: str):
        """Generates responses, manages tool calls, and updates memory."""
        self.memory.add_message({"role": "user", "content": task})

        for _ in range(self.max_iterations):
            chat_history = self.messages_to_string()
            formatted_message = self.prompt_template.format_prompt(chat_history=chat_history, user_input=task)
            messages = [{"role": "user", "content": formatted_message}]
            response = self.llm_client.generate(messages=messages)
            self.memory.add_conversation(user_message={"role": "user", "content": task}, assistant_message=response)
            return response

    def parse_response(self):
        """Parses LLM response."""
        pass
    
    def messages_to_string(self) -> str:
        """Converts messages to a string."""
        formatted_messages = []
        for message in self.memory.get_messages():
            formatted_messages.append(f"{message['role'].capitalize()}: {message['content']}")
        return "\n".join(formatted_messages)

Enable Tool Execution v2 ⚒️

To handle tool execution, update the run method to parse the LLM response and execute suggested tools. We are not enabling any execution loops yet.

def run(self, task: str):
    """Generates responses, manages tool calls, and updates memory."""
    self.memory.add_message({"role": "user", "content": task})

    for _ in range(self.max_iterations):
        chat_history = self.messages_to_string()
        formatted_message = self.prompt_template.format_prompt(chat_history=chat_history, user_input=task)
        messages = [{"role": "user", "content": formatted_message}]
        response = self.llm_client.generate(messages=messages)
        action_dict = self.parse_response(response)

        if action_dict:
            action_name = action_dict["name"]
            action_arguments = action_dict["arguments"]
            logger.info(f"Executing {action_name} with arguments {action_arguments}")
            execution_results = self.executor.execute(action_name, **action_arguments)
            return execution_results
        else:
            logger.info("Agent is responding directly.")
            self.memory.add_conversation(user_message={"role": "user", "content": task}, assistant_message=response)
            return response

def parse_response(self, response: Dict):
    """Extracts tools or continues the conversation."""
    import regex, json

    pattern = regex.compile(r'\{(?:[^{}]|(?R))*\}')  # Supports nested structures
        message_content = response.content
        # Unescape backslashes
        message_content = message_content.replace('\\\\n', '\\n').replace('\\n', '\n').replace('\\\'', '\'').replace('\\\\', '\\')
        # Replace double curly braces with single curly braces
        message_content = message_content.replace('{{', '{').replace('}}', '}')
    match = pattern.search(message_content)
    if match:
        action_content = match.group()
        try:
            action_dict = json.loads(action_content.strip())
            return action_dict
        except json.JSONDecodeError:
            raise ValueError("Invalid JSON in action content")
    return None

You can test this new agent mode with the following prompt template:

STRING_PROMPT_TEMPLATE = """
{system_message}. You have access to the following tools:

{tool_details}

Respond to the the user with the right tool and input whenever is needed.

Valid tool name values: {tool_names}.

When responding to the user, provide only ONE tool per $JSON_BLOB, as shown in the example below delimited by triple backticks:

```
{{
    "name": $TOOL_NAME,
    "arguments": $INPUT
}}
```

Previous conversation history:
{chat_history}

New conversation:
{user_input}
"""

Initialize the prompt template as a StringPromptTemplate:

prompt_template = StringPromptTemplate(STRING_PROMPT_TEMPLATE)

Initialize agent with an LLM client, system message, tools and prompt template.

# Initialize LLM client
client = OpenAIChatCompletion(base_url='https://api.openai.com/v1', model='gpt-4')

# Define the system message
system_message = {"role": "system", "content": "You are a helpful assistant."}

# Initialize the Agent with the LLM client and system message
agent = Agent(llm_client=client, system_message=system_message, tools=tools, prompt_template=prompt_template)

Finally, run the Agent.

agent.run("What is the weather in New York?")
'New York: 80F.'

While I could drive reasoning via the prompt and execute suggested tools, I needed to manage the entire loop of reasoning, tool selection, action, and repetition. Unlike OpenAI's Function Calling, which handled this loop with Tool messages, I had to implement this logic through prompting techniques.

My next step was to create a method to ensure the LLM reasoned over the results and suggested the next steps effectively, maintaining the interaction flow

Enter ReAct (Reason 🧠+ Act ⚡️)!

In 2022, ReAct was introduced to combine reasoning and acting, enabling LLM-based AI agents to not only figure things out but also to interact with the world around them, making decisions and taking steps based on its reasoning.

frames
Figure 16: Illustration showing the intersection of reason and act. (source: Yao et al. 2023).

ReAct involves AI agents thinking, acting, and observing:

  • Thought: Agent analyzes the situation and generates a thought.
  • Action: Based on the thought, the agent takes an action.
  • Observation: After acting, the agent observes the outcome to assess the impact and effectiveness.

Agents adapt these steps based on the task. For deep thinking tasks, the agent cycles through thinking, acting, and observing repeatedly. In action-heavy tasks, the agent thinks at key moments, deciding when to act. This flexibility helps the agent handle various scenarios efficiently.

ReAct Prompting 🗣️

Inspired by LangChain's prompt template, I created a prompt that structures the agent's reasoning, acting, and observing into a seamless loop.

# References:
# https://smith.langchain.com/hub/hwchase17/react-chat
# https://smith.langchain.com/hub/hwchase17/structured-chat-agent

STRING_PROMPT_TEMPLATE = """
{system_message}. You have access to the following tools:

{tool_details}

Respond to the the user with the right tool and input whenever is needed.
Use a json blob to specify a tool by providing a name key (tool name) and an arguments key (tool input).

Valid "action" values:  {tool_names}

Provide only ONE action per $JSON_BLOB, as shown:
```
{{
    "name": $TOOL_NAME,
    "arguments": $INPUT
}}
```

Follow this format:

Question: input question to answer
Thought: reasoning about previous and subsequent steps
Action:
```
$JSON_BLOB
```
Observation: action result
... (repeat Thought/Action/Observation N times)
Thought: I know what to respond as final answer. Using tool to provide final answer
Final Answer: Final response to human

Remember to ALWAYS respond with a valid json blob of a single action when a tool MUST be used.
Use tools if necessary. Respond directly if appropriate.
Format is Action:```$JSON_BLOB```then Observation'

Previous conversation history:
{chat_history}

New conversation:
Question: {user_input}
{react_loop}
"""

🚦Assembly Line: Integrating ReAct Prompt with AI Agent 🤖

I had to update the run method a little bit to handle this new reasoning loop.

import logging
from typing import Dict, List, Optional
from pydantic import BaseModel
from typing import List, Dict

logger = logging.getLogger(__name__)

class Agent:
    """
    Basic Agent class responsible for integrating key components such as the LLM client, tools, memory, and managing tool executions.
    """
    def __init__(self, llm_client, system_message: Dict[str, str], max_iterations: int = 10, tools: Optional[List[AgentTool]] = None, prompt_template : StringPromptTemplate=None):
        self.llm_client = llm_client
        self.executor = AgentToolExecutor()
        self.memory = ChatMessageMemory()
        self.system_message = system_message
        self.max_iterations = max_iterations
        self.tool_history = []
        self.function_calls = None
        self.prompt_template = prompt_template  # Instance of StringPromptTemplate or similar
        
        # Register each tool passed to the Agent using the executor
        if tools:
            for tool in tools:
                # Register Agent Tools
                self.executor.register_tool(tool)
                # Convert Agent Tools
                self.function_calls = [tool.to_openai_function_call_definition() for tool in tools]
        
        # Pre-fill the prompt template with initial variables
        tool_details=self.executor.get_tool_details(),
        tool_names=' or '.join(self.executor.get_tool_names())
        self.prompt_template.update_variables(
            system_message=self.system_message,
            tool_details=tool_details,
            tool_names=tool_names
        )

    def run(self, task:str):
        # Get Chat History
        chat_history = self.messages_to_string()
        # Initialize ReAct Loop
        react_loop = ""

        # Showing Initial Task
        print(f"Question: {task}")

        for _ in range(self.max_iterations):
            # Generate a dynamic prompt using variables
            formatted_message = self.prompt_template.format_prompt(chat_history=chat_history,user_input=task,react_loop=react_loop)
            # Define everything as a user message
            messages = [{"role":"user", "content": formatted_message}]
            
            # Instruct LLM to choose the right tool and respond with a structured output
            response = self.llm_client.generate(messages=messages,stop=["\nObservation:"],)
            # Parse response and extract tools if any
            action_dict = self.parse_response(response)

            if action_dict:
                action_name = action_dict["name"]
                action_arguments = action_dict["arguments"]
                
                logger.info(f"Executing {action_name} with arguments {action_arguments}")
                execution_results = self.executor.execute(action_name, **action_arguments)
                
                current_iteration = f"\nThought: {response.content}\nObservation: {execution_results}"
                react_loop += current_iteration
                print(current_iteration)
            else:
                message_content = response.content
                print(message_content)
                if 'final answer' in str(message_content).lower():
                    final_message = str(message_content).lower().split("final answer:")[-1].strip()
                    response = {
                        "role": "assistant",
                        "content": final_message
                    }
                
                logger.info("Agent is responding directly.")
                self.memory.add_conversation(
                    user_message={"role":"user","content": task},
                    assistant_message=response
                )
                return response    

    def parse_response(self, response: Dict):
        """
        Extracts tools or continues the conversation.
        """
        import regex, json

        pattern = regex.compile(r'\{(?:[^{}]|(?R))*\}')  # Supports nested structures
        message_content = response.content
        # Unescape backslashes
        message_content = message_content.replace('\\\\n', '\\n').replace('\\n', '\n').replace('\\\'', '\'').replace('\\\\', '\\')
        # Replace double curly braces with single curly braces
        message_content = message_content.replace('{{', '{').replace('}}', '}')
        match = pattern.search(message_content)
        if match:
            action_content = match.group()
            try:
                action_dict = json.loads(action_content.strip())
                return action_dict
            except json.JSONDecodeError:
                raise ValueError("Invalid JSON in action content")
    
    def messages_to_string(self) -> str:
        """
        Converts a list of message objects or dictionaries into a multi-line string representation.
        """
        formatted_messages = []

        for message in self.memory.get_messages():
            if isinstance(message, BaseModel):
                message = message.model_dump()
            formatted_messages.append(f"{message["role"].capitalize()}: {message["content"]}")
        return "\n".join(formatted_messages)

I updated the LLM Client class to use the stop parameter in OpenAI's Chat Completion API, stopping token generation at \nObservation:

from typing import Dict, Any, List
import openai

class OpenAIChatCompletion:
    """Handles interaction with OpenAI's API for generating chat completions."""
    def __init__(self, model: str, api_key: str = None, base_url: str = None):
        self.client = openai.OpenAI(api_key=api_key, base_url=base_url)
        self.model = model

    def generate(self, messages: List[str], tools: List[Dict[str, Any]] = None, stop: List[str] = None, **kwargs) -> Dict[str, Any]:
        """Generate a response from OpenAI's API based on input messages."""
        params = {
            'messages': messages,
            'model': self.model,
            'tools': tools,
            'stop': stop,
            **kwargs
        }
        response = self.client.chat.completions.create(**params)
        return response.choices[0].message

You can test this new agent mode with the following prompt template:

prompt_template = StringPromptTemplate(STRING_PROMPT_TEMPLATE)

# Initialize LLM client
client = OpenAIChatCompletion(base_url='https://api.openai.com/v1', model='gpt-4')

# Define the system message
system_message = {"role": "system", "content": "You are a helpful assistant."}

# Initialize the Agent with the LLM client and system message
agent = Agent(llm_client=client, system_message=system_message, tools=tools, prompt_template=prompt_template)

Run the agent with all those changes:

agent.run("What is the weather in New York?")

Take a look at the ReAct Loop:

Question: What is the weather in New York?

Thought: Thought: The user is asking for the weather information in New York. I will use the "get_weather" tool to provide this information.

Action:
```
{
    "name": "get_weather",
    "arguments": {"location": "New York"}
}
```
Observation: New York: 80F.
Thought: I observed that the tool successfully retrieved the current temperature in New York, to which is 80F.  Now I can respond to the user's query.

Final Answer: The current weather in New York is 80F.
{'role': 'assistant', 'content': 'the current weather in new york is 80f.'}

The following notebook contains all the code to build this ReAct agent:

GenAI-Security-Adventures/experiments/Agents/OAI-Blog-Basic-ReAct-Agent-Scratch.ipynb at main · OTRF/GenAI-Security-Adventures
Contribute to OTRF/GenAI-Security-Adventures development by creating an account on GitHub.

Challenges with LLM-Based AI Agents:

  • Ensuring agents don’t unnecessarily use tools by including instructions and creating a "Human input tool."
  • Parsing LLM output for tool invocation and handling inconsistent JSON responses with an OutputParser.
  • Enabling memory of previous steps using retrieval methods for context management.
  • Managing long observations by storing outputs and using retrieval techniques.
  • Keeping the agents focused by reiterating objectives and breaking tasks into sub-objectives.

If you made it this far, thank you for taking the time to read my blog post! I hope you found it insightful and helpful. Stay tuned for more updates!

Future Work

  • Share a cybersecurity use-case with these capabilities (next blog post 😎)
  • Explore other reasoning techniques besides ReAct.
  • Dive into long-term memory with knowledge graphs and vector stores.

References