Agentic Behaviour of LLM

A survey of how versatile LLMs are becoming

Apr 01, 2025

What is agentic behavior?

In simple terms, agentic behavior refers to the way an agent acts. An agent is a working system, just like a human system. By definition, the system is a set of individual components working together in a symphony to achieve a specific goal. An agent achieves a particular goal by leveraging its individual components.

Now, it is important to understand that the goal can be achieved in many ways or paths. There are no particular paths or solutions. There are many solutions to a goal, but not all solutions are effective. Some solutions take a lot of time, while others consume both time and resources. The way an agent acts or chooses a particular solution to achieve the goal is called agentic behavior.

Another thing I would like to point out is that not all solutions are satisfactory even though the goal is achieved. For example, consider a human driving a car to reach point B. He can take the shortest route, which has a lot of traffic congestion, or use a longer route with little to no traffic congestion. In either case, humans have to trade something or another thing to reach point B.

Agentic systems, like humans, must consider the environment and use meaningful and appropriate resources to make smart decisions. This adaptability means that the agent can automatically select resources based on the goal, providing a sense of reassurance about the flexibility of AI.

What are resources?

They are information and the tools they possess. We will come to this later in the article.

How can we evaluate the smartness of an agent?

Agents have artificial brains and memories. They can perceive, observe, think, reason, evaluate, and then make decisions. In AI, the brain is usually an artificial neural network trained on vast amounts of data to extract information and meaningful patterns and understand data representation and knowledge distribution. The type of data and the training technique of the agents determines how smart they are.

One of the most important and well-known training techniques is called reinforcement learning.

In reinforcement learning, the agent is made to learn meaningful information by letting them make decisions or take actions. For every right decision or action, the agent is rewarded, and it is punished for every wrong one. The goal here for the agent is to maximize the reward by discovering which actions or decisions yield rewards. We must understand that every action may or may not take agents to reach the goal. There can be obstructions on the way to achieving the goal. That’s why it is important for the agent to explore and discover all the given possibilities so that it can make optimal decisions. In other words, it should remember reward-bringing actions within the entire dataset.

Remembering the ins and outs of the dataset is important because it will help the agent to solve new and unfamiliar problems when given.

Thanks for reading Adaline Labs! This post is public so feel free to share it.

Now that you know what agentic behavior is, you may ask: how do large language models show these skills?

Now that we've explored how agents make decisions and use resources to achieve goals let's see how LLMs fit into this picture. We've discussed how reinforcement learning helps agents learn from rewards and punishments to make better decisions. This naturally leads us to the question of how modern AI systems like LLMs become "smart" agents.

LLMs, originally not designed as agents, have acquired some agent-like traits through their training. Their ability to understand goals, consider multiple paths, and select the best response based on learning is truly impressive, making them 'smart' agents in the world of AI.

Let's look at how the LLM's training process builds the foundation for agentic capabilities, step by step.

How does LLM fit into agentic behavior? What role does it have to play?

LLMs are gigantic neural networks that can process and extract information to generate information given a particular input. These inputs, known as prompts, essentially retrieve relevant information from these gigantic LLMs. In one way, LLMs serve as libraries.

LLM goes through a three-phase training process:

Unsupervised learning, where it learns to predict the next item in a sequence.
Supervised fine-tuning, where it trained on input-output pairs such as question-answer to predict item in a conversation and intentful manner.
Reinforcement learning with human feedback (RLHF) is a crucial phase in the training of LLMs. In this phase, every output is rated good or bad by the human evaluator. This feedback is used to train a reward model. The reward model is then used to train the LLM to maximize the reward, which eventually aligns the model to predict safe and accurate information. This process is similar to how a driver might learn to navigate through different routes based on the time of the day or road conditions. Each training phase builds different aspects of agentic behavior. Think of it like teaching a child - first, they learn the language, then specific skills, and finally, social rules and values.

Remember our driving example?

During RLHF, the model learns to consider factors like clarity, helpfulness, and accuracy, which are similar to how a driver might consider the time of the day to commute for less traffic, distance, and road conditions. The model is rewarded for outputs that humans rate as good and penalized for those rated poorly. Likewise, the driver can reach point B quickly if he learns the right time to commute through a traffic-prone route or the right speed to commute through a longer route.

As you can see, there are two frameworks at play here:

An agentic system that automatically selects optimal solutions to achieve their goal.
An LLM that produces output based on what it learns with respect to human alignment. Human alignment refers to the process of training the LLM to produce outputs that are in line with human safety and values. It's important to note that reinforcement learning is only used to align the LLMs with human safety and alignment, ensuring that the model's outputs are beneficial and safe for human use.What if we connect Agentic behavior with Auto-regressive LLMs? In short, we can now leverage the power of autonomy under one framework. We don’t have to use any other framework to get our jobs done. Just one prompt.

Share Adaline Labs

Now, let's connect these training methods to specific agentic capabilities:

Planning and decision-making emerge when the model is trained to break down complex tasks into steps. For example, an LLM agent can learn to solve a math problem by showing its work step-by-step rather than jumping straight to the answer. This capability develops through both supervised learning with step-by-step examples and reinforcement learning that rewards thorough reasoning.

*Overview of prompt-based reasoning approaches* | **Source**: Advancing Reasoning in Large Language Models: Promising Methods and Approaches

Tool use abilities come from specialized training, where the model learns when and how to use external resources. For instance, an LLM can be trained to recognize when it needs to use a calculator, search for information, or call an API. This is like teaching someone not just facts but when to use a reference book or calculator.

*Model automatically chooses which tool to call* | **Source**: Toolformer: Language Models Can Teach Themselves to Use Tools

Some advanced training techniques even create self-improvement capabilities. For instance, when an LLM is trained to evaluate its own outputs. If the output is not satisfactory, they are asked to try again. This type of mechanism develops a form of self-reflection. This is similar to how we learn from our mistakes.

*Overview of how self-reflection works* | **Source**: Self-Reflection in LLM Agents: Effects on Problem-Solving Performance

The key point here is that agentic capabilities don't emerge by accident. They are deliberately cultivated through specific training objectives and methods. The more sophisticated these training techniques become, the more agentic behavior we see in LLMs.

OpenAI’s agent is good example of how LLMs with vision rendering capabilities like GPT-4o can infact browse web for user defined queries. They call their agent an Operator.

*Overview of how the Operator works* | **Source**: OpenAI

The image above explains how the Operator works. First, during perception, the agent captures screenshots that give it a visual snapshot of the computer's current state. This is similar to how a human would observe a screen before deciding what to click. Then comes reasoning, where the agent uses chain-of-thought processing to plan its next steps. This inner monologue isn't just fancy talk - it dramatically improves performance by helping the agent evaluate what it sees, track its progress, and adapt dynamically to changing conditions on screen.

Finally, the agent takes action - clicking, scrolling, or typing as needed until the task is complete. What’s interesting is that this agent knows its limits. For sensitive actions like entering passwords or dealing with CAPTCHA forms, it asks for human assistance. This demonstrates another key aspect of good agentic behavior: knowing when to act independently and when to seek help. This perception-reasoning-action cycle shows how modern LLMs can interact with their environment in ways that closely mimic human problem-solving approaches.

I hope you have understood the basics of Agent and LLMs and that we can combine these two frameworks together. Now, let's briefly explore the components that make LLM-based agents.

Agentic-LLM is the brain and tools is the body parts. The former thinks, reasons, and analyzes while the latter performs task.

Thanks for reading Adaline Labs! This post is public so feel free to share it.

Components of LLM-based Agents

Let's explore the key components that make up an effective LLM-based agent. These components work together like parts of a brain, each handling different aspects of intelligent behavior.

Memory systems: LLM agents need both short-term and long-term memory. The context window serves as short-term memory during conversations. Modern models have expanded these significantly - Claude 3.7, a hybrid reasoning model offers up to 200k tokens, while Gemini 2.5 pro by Google Deepmind offers 1 million context window. For long-term memory, agents use external databases to recall important information from past interactions.
Planning and reasoning: Agents need to break down complex goals into manageable steps. This happens through chain-of-thought reasoning, where the model works through problems step-by-step instead of jumping to conclusions. Google's Gemini models specifically focus on this "thinking" capability (Google introduces Gemini 2.5 Pro), including many other models like Claude 3.7, Deep Research by OpenAI, etc.
Tool use: An agent becomes truly useful when it can interact with external tools and APIs. This allows it to perform calculations, search the web, or control devices. Models like Mistral have "native function calling" capabilities to seamlessly work with external systems (Mistral Large 2 foundation model). I should not forget to mention the Model Context Protocol by Anthropic. It is a local, in-system agent that connect to various applications, tools, external databases etc.

**Diagram of MCP architecture | Source**: What is Model Context Protocol (MCP)? How it simplifies AI integrations compared to APIs

Self-evaluation: The ability to review and improve one's own outputs is crucial for reliable agents. Studies show that when LLM agents reflect on mistakes and try again, their problem-solving improves significantly (Self-Reflection in LLM Agents).
Environment perception: Agents need to understand their surroundings to make good decisions. For example, Google's Project Mariner lets AI agents see and interact with web browsers, adapting their actions based on what they observe (Google introduces Gemini 2.0). I have already mentioned about Operator by OpenAI.

With such examples, you can definitely see how versatile LLM becomes when it inherits agentic behavior.

“Open technologies like the Model Context Protocol are the bridges that connect AI to real-world applications, ensuring innovation is accessible, transparent, and rooted in collaboration. We are excited to partner on a protocol and use it to build agentic systems, which remove the burden of the mechanical so people can focus on the creative.” – Dhanji R. Prasanna, Chief Technology Officer at Block.

As we've seen throughout this article, LLMs with agentic behavior are changing how we work with AI. These systems aren't just answering our questions – they're helping us get things done by planning, using tools, and making smart choices on their own.

So what's coming for the rest of 2025? We'll likely see AI assistants that can handle longer conversations and more complex tasks. They'll get better at understanding images and videos alongside text. Take a look at the new text-to-image generation capabilities from GPT4o, it is absolutely addicting.

Vibe coding, an evolving idea of coding just be speaking. Coding tools like Cursor with a choice LLM like Sonnet 3.7 can help you create any app just by talking.

Models like Claude, GPT, and Gemini will spend more time "thinking" before they answer tough questions, making their responses more accurate. Perplexity will launch Comet, their agentic web-browser, etc. So, many exciting things are about to happen in the remaining three quarters of 2025.

Excited to see how it unfolds.

The goal isn't to replace human thinking with AI. Instead, we aim to create helpers that handle routine tasks. This way, we can focus on being creative, solving problems, and doing what humans do best.

Adaline Labs

Agentic Behaviour of LLM

A survey of how versatile LLMs are becoming

What is agentic behavior?

How does LLM fit into agentic behavior? What role does it have to play?

Components of LLM-based Agents

Discussion about this post