In our last post of the series, we audited the GitHub Copilot SDK's token usage. We learned how to strip away the "Abstraction Tax" by setting a custom persona and disabling default toolkits, bringing our input overhead down by over 99%.
Now that we have a lean, hyper-efficient engine, it’s time to give our agent some real work.
Traditionally, if you wanted an AI to answer questions based on a corporate document like an employee handbook, you would build a RAG (Retrieval-Augmented Generation) pipeline. You would chunk the text, generate vector embeddings, store them in a database, and perform a mathematical similarity search.
Today, we are going to bypass traditional RAG completely. By introducing agentic tool execution, we will watch our agent autonomously reason, hunt for data, fail, pivot, and ultimately find the right answer all on its own.
The Setup: Armed with File Tools
Instead of serving pre-chewed text chunks to our LLM, we are going to hand our agent raw text files and a couple of command-line tools: grep (for pattern searching) and view (for reading files).
First, let's configure our environment, point our CopilotClient back to our budget-friendly Gemini 3.1 Flash Lite endpoint, and restrict its toolkit to just those two utilities.
async def main():
question = "What should I do if I am pregnant?"
async with CopilotClient() as client:
async with await client.create_session(
on_permission_request=PermissionHandler.approve_all,
model=model,
provider=provider,
system_message=SystemMessageReplaceConfig(
mode="replace",
content="You are a helpful assistant. Answer queries using /content/data/EmployeeHandbook.txt"
),
available_tools=['grep', 'view'] # Giving the agent its tools
) as session:
session.on(handle_usage)
response = await session.send_and_wait(question, timeout=300)
By adding available_tools=['grep', 'view'] and setting PermissionHandler.approve_all, we are handing the agent keys to the workspace. We aren't telling it how to use them; we are just providing the instructions and standing back.
Watching the Agent Reason in Real-Time
When we run this code against a standard HR Employee Handbook, something fascinating happens under the hood. Let's look at the telemetry logs generated by our event handler:
System message: {
"first_line": "You are a helpful assistant. Answer queries using `/content/data/EmployeeHandbook.txt`",
"content_length": 86
}
Tokens used: {'input': 973, 'output': 15, 'cache_read': 0, 'cache_write': 0}
Tool execution: {'tool_name': 'grep', 'arguments': {'pattern': 'pregnan'}}
Tokens used: {'input': 1003, 'output': 15, 'cache_read': 0, 'cache_write': 0}
Tool execution: {'tool_name': 'grep', 'arguments': {'pattern': 'maternity'}}
Tokens used: {'input': 1036, 'output': 22, 'cache_read': 0, 'cache_write': 0}
Tool execution: {'tool_name': 'view', 'arguments': {'path': '/content/data/EmployeeHandbook.txt'}}
Tokens used: {'input': 1105, 'output': 23, 'cache_read': 0, 'cache_write': 0}
Tool execution: {'tool_name': 'grep', 'arguments': {'pattern': 'maternity', 'output_mode': 'content'}}
Tokens used: {'input': 1346, 'output': 170, 'cache_read': 0, 'cache_write': 0}
Assistant message: If you are pregnant, you should notify the company of your intent to take maternity leave **no later than 12 weeks before the expected date of confinement**. ...
The Autonomous Loop Broken Down
Look closely at how the agent reacted when the user asked, "What should I do if I am pregnant?"
- The First Attempt (Grep 'pregnan'): The agent automatically uses
grepto look for variations of the word "pregnant". However, our specific corporate handbook doesn't use that word in its headings. The search returns nothing. - The Pivot (Query Rewriting): In a traditional keyword search, the loop would end here with an unhelpful "I couldn't find anything." But an agent possesses reasoning capabilities. It realizes "pregnant" relates to "maternity leave," so it autonomously rewrites its search intent and fires a second
grepfor 'maternity'. - The Deep Dive (View File): After finding hits, it calls the
viewtool to read the specific sections of/content/data/EmployeeHandbook.txt. - The Final Synthesis: It pulls the relevant paragraphs, calculates the parameters, and delivers a beautifully structured answer outlining the 12-week notice requirement and confinement details.
Agentic Workflows vs. Traditional RAG
This highlights a fundamental shift in how we handle unstructured data:
The Core Difference: Traditional RAG relies heavily on embeddings to find semantic similarities between terms. If your vector math or chunking strategies are slightly off, relevant context gets missed. An Agentic Workflow solves this via iterative reasoning: it can notice a tool returned a blank result, rethink its strategy, and try alternative terms autonomously.
The Agentic Trade-Off
This incredible intelligence isn't completely free. You will notice two clear trade-offs when shifting from standard API bots to true agents:
- Higher Token Consumption: Because the agent is running multiple back-and-forth loops, evaluating tool outputs, and re-submitting context, it uses significantly more input tokens per query.
- Increased Latency: Waiting for multiple execution loops means responses take seconds rather than milliseconds.
The Good News: In production environments, Prompt Caching heavily mitigates these costs. While the free-tier Gemini API doesn't have it enabled by default, production endpoints allow cached system instructions and document states to be read at roughly one-tenth of the standard input token cost. Given the massive leap in answering reliability, it's an incredibly good deal.
In our next tutorial, we will take this a step further and look at how to build and register our own completely custom Python functions as agent skills.
Try It Yourself
Want to watch the agent hunt through the employee handbook live? Click the badge below to jump directly into the interactive Google Colab notebook, run the telemetry audits, and try changing the questions to see how the agent adapts its tool usage!