Tool Calling in AI Agents: A Reality Check

The concept of tool calling in AI agents is essentially just an expensive switch-case system prone to logic errors.

· 5 min read

a computer screen with a bunch of words on it

Last week, one of my autonomous agents burned through $47 in API costs overnight because it decided to call the revenue report function repeatedly. That’s the price you pay for overestimating the “intelligence” of tool calling.

What is tool calling, really?

To put it bluntly, tool calling is just giving the AI a list of buttons and hoping it knows when to press which one.

In a normal chat, an AI model simply calculates the probability of generating the next word. But when you provide it with tools, the AI can read the descriptions of specific code functions—for example, lay_thoi_tiet or gui_email. Then, instead of replying with text, it returns a JSON string containing the function name and the necessary parameters.

Your backend system receives this JSON, executes the corresponding code, and feeds the result back to the AI. Only then does the AI use that result to generate a final answer for the user. It sounds like a powerful technology, but in production, it’s riddled with issues.

The Fatal Flaw: Infinite Loops

I used to think that GPT-5 or Claude Sonnet 4.6 were smart enough to stop themselves when a function call failed. But after three months of real-world use, it turns out they are still quite poor at handling logic errors.

Errors caused by lack of context

If your API returns a 500 error code because the server crashed, the AI doesn’t understand that. Instead of informing the user that the system is down, the AI often tries to “fix” it by changing the parameters and calling the function again. And again. From an initial expectation of 12 API calls, my system once recorded a jump to 184 calls just because it was trying to find a result that didn’t exist.

Latency and Exorbitant Costs

Everything involving tool calling takes time. You are stacking the time it takes to generate the JSON, the actual API execution time, and the time to generate the final response.

The Time and Money Problem

End-users often have to wait 5 to 10 seconds for a simple operation. This reminds me of when I tested Ollama và LLM Local: Có thể thay thế ChatGPT? — running locally is cheap but slow, while using top-tier APIs is both slow and expensive. You’re paying for tokens for the function description input, the generated JSON, and the massive amount of data returned from the function.

★★★★★

Great books on this topic

🛒 Check Prices & Buy Now on Shopee →

* Affiliate link - no extra cost to you

If you are looking for a server powerful enough to host local AI models to reduce API costs, check out our list of specialized workstations here.

When should you use it?

Despite the flaws, we cannot deny that tool calling is the only way for AI to interact directly with the outside world.

Don’t use it for everything

Only use tool calling when data changes constantly or when you need to execute actual commands—such as checking inventory, sending a Slack message, or interacting with a database. If you only need the AI to answer based on existing documents, use standard RAG. Don’t overcomplicate your system architecture.

Comparison Table of Approaches

CriteriaTool CallingTraditional RAGNotes
PurposeExecute actions, fetch real-time dataRead static docs, Q&ARAG is safer
LatencyVery highLow to mediumTool calling requires multiple roundtrips
Error RiskInfinite loops, wrong JSON parametersHallucinations/Incorrect infoNeeds an auto-kill mechanism

How to Set Up Tool Calling Safely

  1. Limit the number of loops: Always hard-code the maximum number of consecutive function calls an AI is allowed to make. A reasonable limit is usually 3. Beyond this, you must force it to stop.
  2. Tight error handling: Don’t return raw system error logs for the AI to read; it will get confused. Catch the error and return a concise message like “API error, please notify the user.”
  3. Crystal clear function descriptions: Don’t name a function get_data. Name it get_user_billing_history. Parameter descriptions must be detailed down to the specific data type.

Frequently Asked Questions

Which model is currently best for tool calling?

Claude Sonnet 4.6 is currently performing the best in adhering to JSON schemas without inventing extra parameters. Gemini 3 Pro responds quickly but occasionally generates data fields that weren’t in the function description.

Should I let the AI run tools automatically without confirmation?

Absolutely not if the tool modifies data—such as deleting files, processing payments, or sending bulk emails. Always include a step where the user clicks a button to confirm. Read Perplexity AI 2026: Đã đến lúc xóa Google chưa? to see how even a massive search system still needs humans to verify final actions.

Which library should I use for tool calling?

Instead of relying on bloated frameworks, you should write short, direct API calls yourself. Clean code is much easier to control, monitor, and debug when the system encounters errors.

Conclusion

Tool calling is not magic. It is just an expensive and precarious chain of JSON analysis. Attempting to cram rigid logical programming into a probability-based prediction engine always comes with high risks. The concept of a fully autonomous AI agent isn’t quite feasible yet. Your system will only run stably once you treat tool calling as a feature that needs strict supervision rather than a silver bullet.

You might also like

← Back to Blog