Why Did AI Start Forming Teams?
ChatGPT is intelligent. But one AI was no longer enough. Just as humanity built civilization through division of labor and organization, AI is beginning to move in the same direction. Exploring the origins of multi-agent systems.
Introduction
ChatGPT is intelligent.
Claude is intelligent.
Gemini is intelligent.
Over the past few years, large language models have demonstrated capabilities that exceeded the expectations of many people.
They can write.
They can code.
They can translate.
They can summarize.
They can debate.
They can plan.
In some cases, they can even produce answers approaching the level of domain experts.
When ChatGPT was released at the end of 2022, many people began to wonder:
"Have we essentially solved artificial intelligence?"
Yet that is not what happened.
Models became larger.
Performance improved.
Context windows expanded.
GPT-3 became GPT-4.
Claude 2 became Claude 3.
Gemini 1.5 introduced a context window of one million tokens.
And yet many of the fundamental problems remained.
AI struggled to complete complex projects from beginning to end.
It struggled to maintain long-term plans.
It struggled with tasks that crossed multiple domains of expertise.
The larger and more complicated the task became, the more unstable its performance often appeared.
Most importantly, a single AI was still attempting to do everything alone.
Human civilization did not emerge from individuals working in isolation.
No single person built an airplane.
No single person built the Internet.
No single person runs a modern corporation.
What enabled humanity to progress was not intelligence alone.
It was cooperation.
It was division of labor.
It was organization.
And interestingly, AI now appears to be moving in the same direction.
The rise of multi-agent systems is not merely another technological trend.
In many ways, it represents AI rediscovering principles that human civilization discovered thousands of years ago.
Why did AI begin forming teams?
Why was a single AI no longer enough?
Why are structures resembling human organizations beginning to emerge in the world of artificial intelligence?
Before discussing frameworks such as AutoGen and CrewAI, it is worth examining the deeper question behind them.
Why did AI begin to cooperate?
Perhaps this is not simply a story about technological progress.
Perhaps it reflects a more universal challenge faced by intelligence itself.
Chapter 1 — Why Did Humans Create Organizations?
Before discussing AI, we should first look at human history.
What is happening in AI today closely resembles problems humanity has encountered many times before.
Organizations are so common in modern society that we rarely stop to think about them.
Companies. Universities. Hospitals. Governments. Research institutions.
We live inside organizations.
Yet for most of human history, large-scale organizations did not exist.
Hunter-gatherer societies were typically small groups consisting of a few dozen individuals.
Within such groups, people performed many different roles.
They searched for food. Built shelters. Maintained tools. Raised children. Protected the community.
The concept of a specialist barely existed.
As civilization developed, however, this began to change.
Agriculture emerged. Permanent settlements appeared. Populations increased. Cities were formed.
As societies became larger and more complex, it became increasingly difficult for individuals to do everything themselves.
Some people specialized in farming. Others specialized in construction. Others specialized in trade.
Specialization emerged.
One of the most influential thinkers to describe this process was Adam Smith.
In The Wealth of Nations, Smith introduced his famous example of a pin factory.
If a single craftsman attempted to manufacture pins from start to finish, productivity would remain limited. But if the work were divided into specialized tasks, production would increase dramatically.
One worker stretched the wire. Another cut it. Another sharpened the tip. Another packaged the finished product.
By concentrating on specific tasks, workers collectively achieved far greater output than any individual could produce alone.
This idea later became one of the foundations of the Industrial Revolution. It also became a foundation of modern corporations. And ultimately, a foundation of modern science itself.
What is important is that individual intelligence did not suddenly become greater. Human brains did not fundamentally change. What changed was organizational structure. Division of labor increased the capabilities of the group.
This idea later influenced fields ranging from economics to sociology and cognitive science.
We often think of knowledge as something that exists inside individuals. In reality, much of human knowledge exists across society.
Very few people fully understand how a modern smartphone works.
There are CPU designers. Operating system engineers. Communications specialists. Semiconductor researchers.
Each understands only part of the whole. The complete system exists across the organization.
Human civilization was not built solely through individual intelligence.
It was built through organized intelligence.
And this perspective will become critically important when we begin examining the challenges faced by AI.
Chapter 2 — The Limits of Individual Intelligence
We often think about AI as if it were an individual.
ChatGPT. Claude. Gemini. A single model. A single mind. A single intelligence.
Yet this perspective contains a hidden assumption.
It assumes that powerful intelligence must exist as a single entity.
Human society suggests otherwise.
No matter how brilliant a researcher may be, they cannot build a modern smartphone alone.
No matter how capable a CEO may be, they cannot run a global corporation entirely by themselves.
No matter how talented a scientist may be, they cannot master every field of human knowledge.
There are limits to capability. Limits to attention. Limits to memory. Limits to time.
AI faces similar constraints.
Modern models demonstrate extraordinary abilities, but they are far from omnipotent.
In fact, as their capabilities have increased, new limitations have become increasingly visible.
Consider software development.
A single model can often handle a program consisting of a few hundred lines of code.
But a project containing hundreds of thousands of lines is different.
Architecture must be designed. Code must be implemented. Tests must be written. Reviews must be conducted. Systems must be maintained.
Each task requires a different perspective.
It is possible to ask one model to do everything. But that does not necessarily mean it is efficient.
As many developers have experienced, forcing a massive project into a single conversation often produces unstable results.
The model loses track of context. It forgets earlier decisions. It proposes contradictory designs.
In a previous article, we discussed the phenomenon known as Lost in the Middle.
Many people assumed that larger context windows would solve this problem. Yet that did not happen.
As information grows, a new challenge emerges: which information deserves attention?
This is not merely a technical limitation.
It may reflect a deeper structural challenge inherent to intelligence itself.
Humans face the same problem.
More information does not automatically lead to better decisions. Attention remains limited. Prioritization remains necessary.
And a single intelligence, no matter how powerful, can only process so much at once.
Chapter 3 — AI Continued to Acquire New Capabilities
Although the limitations of individual intelligence were becoming increasingly apparent, this did not mean that progress in AI had stopped.
In many ways, the opposite was true.
Between 2022 and 2025, AI systems acquired new capabilities at a remarkable pace.
Models that had once functioned primarily as text generators gradually began to resemble systems capable of acting in the world.
This transformation is essential for understanding the emergence of the Agent era.
Early large language models were, at their core, prediction systems.
They answered questions. They generated text. They translated languages. They summarized information.
These tasks were impressive, but they remained fundamentally confined to language generation.
The intelligence existed entirely inside the model. It could describe actions, but it could not perform them.
The situation began to change when researchers started connecting language models to external environments.
One of the most influential examples was ReAct.
As discussed in an earlier article, ReAct combined reasoning and acting into a single framework.
Instead of simply thinking about a problem, an AI could now interact with its environment.
It could search for information. Retrieve data. Observe outcomes. Then reason again based on what it had learned.
Prior to ReAct, most language models attempted to solve problems entirely within their own internal knowledge.
After ReAct, interaction with the external world became part of the reasoning process itself.
This was more than a performance improvement. It represented a shift in how intelligence was being conceptualized.
Humans do not solve most problems entirely inside their heads.
We consult books. We search the web. We ask experts. We conduct experiments. We observe reality.
Human reasoning is deeply intertwined with interaction. ReAct brought this principle into the world of AI.
Soon afterward came Toolformer.
Toolformer addressed a different limitation. Rather than forcing a model to solve every problem internally, it demonstrated that AI could learn when and how to use external tools.
Calculators. Search engines. Translation systems. Knowledge bases. External APIs.
For decades, intelligence had often been imagined as something self-contained. The more knowledge a system possessed internally, the more intelligent it was assumed to be.
Toolformer challenged that assumption.
Humans rarely perform complex arithmetic mentally. We use calculators. We do not memorize entire encyclopedias. We search for information when needed.
Intelligence is not merely the possession of information. It is also the ability to access the right resources at the right time.
Toolformer introduced this idea into modern AI systems.
Another important step came with Reflexion.
Reflexion introduced a capability that humans rely upon constantly but which traditional language models lacked. Reflection.
As discussed previously, Reflexion demonstrated that performance could improve even without updating model weights.
An agent could examine its failures. Extract lessons. Modify future behavior. Then try again.
This process resembles how humans learn in everyday life.
We make mistakes. We analyze what went wrong. We adjust our approach. We improve.
Reflexion transformed this cycle into a structured framework for AI.
Then came Voyager.
Voyager moved even further toward the idea of continuous growth.
Operating inside Minecraft, Voyager demonstrated that an AI could continuously acquire and accumulate skills.
It gathered resources. Crafted tools. Explored new environments. Learned new abilities.
Most importantly, successful behaviors were stored and reused. Experience was no longer discarded after a task ended. Past actions influenced future actions. Knowledge accumulated over time. Growth became possible.
This resembled one of the most fundamental characteristics of human development.
People do not start from scratch every morning. We carry experience forward. Skills accumulate. Knowledge compounds. Past successes and failures shape future decisions.
Voyager introduced a version of this process into AI systems.
Soon after, BabyAGI tackled another important challenge. Task management.
Rather than focusing on a single action, BabyAGI attempted to manage entire sequences of objectives.
A goal would be established. Tasks would be generated. Priorities would be assigned. Results would be evaluated. New tasks would then be created based on previous outcomes.
This pushed AI closer to functioning as a persistent problem-solving system rather than a simple question-answering tool.
Looking back, an interesting pattern becomes visible.
Each of these systems introduced a capability that humans rely upon every day.
ReAct introduced action. Toolformer introduced tool use. Reflexion introduced reflection. Voyager introduced growth. BabyAGI introduced planning and task management.
Individually, these projects addressed different research questions.
Collectively, however, they were moving in the same direction.
They were gradually transforming language models into increasingly autonomous agents.
The goal was no longer merely to generate text. The goal was to solve problems. To act. To adapt. To improve. To pursue objectives over time.
Yet despite all of these advances, a significant limitation remained.
And surprisingly, it was a limitation that looked remarkably human.
Chapter 4 — Why Was a Single Agent No Longer Enough?
As agent research progressed, researchers began to notice something unexpected.
Adding more capabilities was not enough.
Agents could act. They could use tools. They could reflect. They could plan. They could learn from experience.
Yet many complex problems remained difficult.
The reason was surprisingly simple.
The agent was still trying to do everything alone.
Consider the process of conducting research.
A researcher begins by reviewing prior work. They read papers. Organize findings. Identify gaps in the literature. Form hypotheses. Design experiments. Analyze results. Write manuscripts. Review arguments. Revise drafts.
In practice, very few large research projects are completed entirely by a single person.
Research groups exist. Collaborators exist. Reviewers exist. Scientific communities exist.
Knowledge and labor are distributed across many individuals.
Software development follows the same pattern.
Architects design systems. Developers write code. Testers verify functionality. Security specialists evaluate risks. Operations teams maintain infrastructure.
The larger a project becomes, the more important specialization becomes.
The reason is cognitive load.
Human cognition has limits. AI systems have limits as well. And interestingly, many of those limits resemble human ones.
As context grows, attention becomes diluted. As tasks multiply, accuracy often declines. As responsibilities expand, decision-making becomes increasingly unstable.
Earlier, we discussed the Lost in the Middle phenomenon. In many ways, it reflects the same underlying issue.
More information is not automatically better. What matters is identifying the information that deserves attention. When a single model attempts to manage everything simultaneously, this challenge becomes increasingly severe.
Initially, many researchers believed that scaling would solve the problem.
Larger models. Longer context windows. More compute. More data.
For a time, this seemed plausible. Yet reality proved more complicated.
Large contexts introduced new complexity. Massive tasks introduced new management burdens. Long reasoning chains created new opportunities for error.
The situation closely resembled the challenges faced by growing organizations.
A company does not solve every problem simply by hiring more employees.
As it grows, new structures become necessary. Departments emerge. Responsibilities become formalized. Communication systems are created. Management hierarchies appear. Coordination becomes a challenge of its own.
The same pattern began to appear in AI systems.
As capabilities increased, organizational problems became increasingly important.
Researchers gradually realized that the bottleneck was not merely model capability. The bottleneck was structure.
Perhaps the fundamental assumption itself was flawed.
Perhaps intelligence should not be viewed as a single agent becoming increasingly powerful.
Perhaps it should be viewed as multiple specialized agents cooperating toward shared goals.
Human civilization did not emerge from isolated geniuses.
It emerged from organized intelligence. From distributed expertise. From coordinated effort.
The same possibility was beginning to emerge in AI.
What followed was not merely another improvement in language models.
It was a shift in the unit of intelligence itself.
Instead of asking how powerful a single agent could become, researchers began asking a different question.
What if intelligence worked better as a team?
That question would eventually lead to the rise of multi-agent systems.
Chapter 5 — AI Became a Team
As the limitations of single-agent systems became increasingly apparent, researchers arrived at a surprisingly natural question.
If one AI could not efficiently do everything, why not allow multiple AIs to cooperate?
Today, this idea sounds obvious.
Yet for most of the history of artificial intelligence, it was not.
For decades, AI research had largely focused on creating increasingly capable individual systems.
Larger models. More parameters. More data. More compute.
The underlying assumption was simple.
If intelligence could be made sufficiently powerful, a single system would eventually be capable of handling increasingly complex tasks on its own.
Human history suggests otherwise.
Modern society is not built upon a single extraordinary individual.
It is built upon networks of specialists.
Engineers design aircraft. Material scientists develop alloys. Manufacturing experts build components. Safety engineers verify reliability. Maintenance teams keep systems operational.
Each person possesses only a small fraction of the total knowledge required. No individual fully understands the entire system.
Yet collectively, they create technologies far beyond the capabilities of any single person.
This idea was explored extensively by cognitive scientist Edwin Hutchins through the concept of distributed cognition.
According to this perspective, intelligence does not reside solely inside individual minds.
It emerges through interactions between people, tools, documents, procedures, and organizations.
The unit of cognition is often larger than the individual.
The same possibility began to emerge within AI.
Perhaps intelligence did not need to be concentrated inside a single model.
Perhaps multiple agents could cooperate in ways that resembled human organizations.
This approach offered several important advantages.
The first was the distribution of cognitive load.
Instead of asking one agent to manage every responsibility, tasks could be divided among multiple agents.
One agent could focus on planning. Another could conduct research. Another could write. Another could review.
Each agent would operate within a narrower scope and therefore face fewer competing demands.
The second advantage was specialization.
Human productivity increased dramatically when labor became specialized. The same principle appeared applicable to AI.
An agent optimized for research might behave differently from one optimized for criticism.
An agent focused on implementation might reason differently from one focused on evaluation.
Rather than creating a single generalist, it became possible to create a team of specialists.
The third advantage was verification.
This proved especially important.
Throughout this series, we have repeatedly encountered the limitations of AI systems.
Hallucinations. Reasoning errors. False confidence.
Single-agent systems often lacked effective mechanisms for detecting their own mistakes.
A team changes that dynamic.
One agent can challenge another. One agent can review another's conclusions. One agent can identify weaknesses overlooked by another.
This resembles peer review in science. Code review in software engineering. Editorial review in publishing.
Human organizations rely heavily on internal verification processes.
Multi-agent systems offered the possibility of introducing similar structures into AI.
Of course, none of this guaranteed correctness. Adding more agents does not automatically produce better answers. As we will see, entirely new problems emerge.
Nevertheless, multi-agent systems enabled organizational structures that were impossible within a purely single-agent framework.
Gradually, AI research began moving toward a new phase.
The goal was no longer merely to build smarter individuals.
The goal increasingly became understanding how intelligence could be organized.
In many ways, AI was beginning to follow the same path humanity had followed for thousands of years.
Chapter 6 — The Problems Created by Organization
Yet organization always comes with costs.
Human history makes this abundantly clear.
Organizations are powerful. But they are also complex.
A company with one employee does not require meetings.
A company with ten thousand employees cannot function without them.
As organizations grow, communication becomes necessary. Decision-making becomes necessary. Responsibility must be assigned. Information must be shared. Coordination becomes a problem in its own right.
The same challenge appeared in AI systems.
Creating multiple agents was only the beginning.
Researchers quickly discovered that cooperation introduced new questions.
How should agents communicate? How should information be shared? Who should make decisions? How should responsibilities be assigned? What happens when one agent makes a mistake? Who corrects it?
These problems did not exist in traditional single-agent systems.
Human societies have spent thousands of years developing mechanisms to address similar challenges.
Hierarchies. Management structures. Meetings. Reports. Audits. Review processes.
These are not merely administrative procedures. They are cognitive infrastructure. They enable large groups of individuals to function as coordinated systems.
AI organizations would require similar mechanisms.
An even more intriguing possibility soon became apparent.
AI might inherit not only the strengths of human organizations, but also their weaknesses.
Groupthink. Diffusion of responsibility. Information distortion. Communication failures. Excessive conformity.
As discussed in previous articles, many human limitations become amplified within social structures.
The same could potentially happen in AI systems.
Multiple agents might reinforce one another's mistakes. They might share incorrect assumptions. They might converge on flawed conclusions. They might fail to challenge one another effectively.
Human history provides countless examples of such failures.
Organizations are not merely engines of intelligence. They can also become engines of collective error.
This realization carried an important lesson.
Organization was not a universal solution. It solved certain problems while creating new ones.
Yet despite these risks, researchers continued moving toward multi-agent systems.
The reason was simple. Complex problems increasingly demanded cooperation.
Human civilization never abandoned organizations despite their flaws.
Businesses never abandoned teamwork despite inefficiencies.
Science never abandoned collaboration despite disagreements.
The benefits outweighed the costs.
AI appeared to be reaching the same conclusion.
As tasks became more complex, cooperation became increasingly difficult to avoid.
Conclusion
The history of AI began as a search for increasingly capable individuals.
Larger models. Greater performance. Longer context windows. More sophisticated reasoning.
That pursuit continues today.
Yet recent developments suggest a different possibility.
Intelligence does not necessarily have to exist as a single entity.
It can cooperate. It can specialize. It can organize itself.
Human history should make this unsurprising.
Civilization was not built by isolated geniuses.
It was built by societies that shared knowledge.
Organizations that divided responsibilities.
Communities of specialists working together toward common goals.
Today, AI appears to be moving in the same direction.
This is more than a technological trend.
It may reflect a deeper principle about intelligence itself.
In the first part of this series, we explored how AI inherited many human limitations.
Cognitive biases. Social adaptation. Reasoning failures. Sycophancy. Structural flaws.
Yet AI may also be beginning to inherit one of humanity's greatest strengths.
Cooperation.
Division of labor.
Organization.
The emergence of multi-agent systems represents one of the first major steps in that direction.
The age of the solitary AI may not be ending.
But it is increasingly being joined by something new.
The age of organized intelligence.
Next Article
As AI systems began forming teams, a new challenge emerged.
How should they communicate?
How should they coordinate their actions?
And when multiple AIs engage in discussion, do they actually move closer to better answers?
The transition from single agents to multi-agent systems began with a surprisingly simple idea:
Conversation.
In the next article, we will explore:
Why Did AI Start Talking to Itself?
The story of AutoGen and the beginning of the multi-agent era.
References
Smith, A. (1776). An Inquiry into the Nature and Causes of the Wealth of Nations.
Hutchins, E. (1995). Cognition in the Wild. MIT Press.
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2023). ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629
Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M., Hambro, E., Zettlemoyer, L., Cancedda, N., & Scialom, T. (2023). Toolformer: Language Models Can Teach Themselves to Use Tools. arXiv:2302.04761
Shinn, N., Cassano, F., Labash, B., & Gopinath, A. (2023). Reflexion: Language Agents with Verbal Reinforcement Learning. arXiv:2303.11366
Wang, G., Xie, Y., Jiang, Z., Liu, A., Mandlekar, A., et al. (2023). Voyager: An Open-Ended Embodied Agent with Large Language Models. arXiv:2305.16291
Nakajima, Y. (2023). BabyAGI. GitHub
Russell, S., & Norvig, P. (2021). Artificial Intelligence: A Modern Approach (4th ed.).
Wooldridge, M. (2009). An Introduction to MultiAgent Systems (2nd ed.).