Improving Performance of Intent-Based Chat Command Execution

December 3, 2025 · 4 min read

Yura Shin

AI Specialist

Introduction

Users simply send a sentence to a Chat Agent: “Please start monitoring.” “Set the threshold for people to 0.6.” “Add ‘tree’ to the target list.”

While the interaction appears simple, the internal processing required by the LLM is highly complex.

Before taking any action, the LLM must determine the intent:

“Is this a target-setting task? Scenario editing? Or just querying information?”

Then it must:

extract required parameters
validate values
handle errors gracefully and explain what's wrong

Previously, the system attempted to perform all of these steps in a single LLM call.

Although this looked clean on the surface, it repeatedly caused unpredictable and hard-to-debug problems:

Wrong task classification → wrong actions executed
Rule conflicts between different tasks
Incorrect parameter extraction without validation
Exponential growth in maintenance due to entangled rules

To solve these core issues, the Chat Agent was redesigned using a LangGraph-based Multi-Node Routing architecture.

1. Even simple requests are “multi-stage decision-making” for LLMs

The previous Chat Agent tried to interpret everything in one LLM call.

For example, the request:

“Change the threshold for ‘tree’ to 0.3”

Internally required the LLM to:

Identify the type of task
Extract parameters (“tree: 0.3”)
Validate the threshold value
Check configuration conflicts
Judge whether modification is allowed
Respond in natural language

Trying to combine all logic into a single prompt and a single set of rules resulted in:

Rules for one task affecting others
Parameter parsing failures
Small changes requiring full prompt rewrites
Hard-coded and exploding error handling logic

At peak, the prompt length reached 3,700 tokens, continuously growing and becoming fragile.

2. Fundamental issues in the original architecture

The original LLM call served five roles at once:

Task classification
Parameter parsing
Value validation
Error handling
Natural language generation

This caused multiple structural issues:

2.1 Task rule conflicts

Target labels must be in English for video detection. But this rule incorrectly applied to scenario descriptions too — forcing English output even for Korean text.

Result: rules interfering across unrelated tasks.

2.2 Unreliable parameter parsing

Even simple numeric interpretation often failed:

“one point five” → interpreted as 0.15
Word-form numbers or locale-dependent formats → parsing failures

More edge cases → more instability.

2.3 Every error case required manual rule definitions

The LLM handled all error evaluation. Meaning:

Every possible error had to be pre-defined
Any new parameter → new rules → high maintenance

3. Introducing a Routing-Based Architecture

We rebuilt the system using a 3-Stage LangGraph Routing Pipeline.

Core principle:

One purpose per LLM call. Never ask the LLM to do multiple jobs at once.

3.1 Task Routing Node

“Classify the request — and only that”

No parsing. No validation. No rule application.

Minimal responsibility → maximal reliability.

Uses:

Current request text
Available task list
Existing system state → to pick the correct task.

3.2 Task-Specific Parameter Parser

“Each task has isolated prompts, parsers, and rules”

Previously:

All tasks shared the same prompt → rule entanglement

Now:

Each task has its own prompt + parser + rules
Fully isolated LLM call

Examples:

Set-Target Task → dedicated logic only for targets
Start-Monitoring Task → independent logic only for monitoring

No more rule collisions or cross-contamination 🎯

3.3 Error Handling Node

“System validates. LLM explains.”

Process:

LLM extracts values
System Validator confirms correctness
If invalid → Error Node generates user-friendly explanation

Example (threshold 1.5):

Parser: threshold: 1.5
Validator: Out of allowed range
Error Node:

“Threshold must be between 0.0 and 1.0. Please try again.”

LLM no longer decides errors — it only communicates them.

4. Performance Evaluation

Routing-based design didn’t only improve accuracy — it boosted maintainability, stability, and speed.

4.1 Task & Parameter Accuracy

Metric	Before	After
Task Routing Accuracy	82.3%	95.0%
Parameter Parsing Accuracy	69.6%	95.0%

Huge gain thanks to isolating classification and parsing 🎉

4.2 Prompt Length Reduction

Case	Before	After
Min	1,603 tokens	1,106 tokens
Max	3,783 tokens	1,793 tokens

Shorter → more deterministic & reliable LLM reasoning

4.3 Latency Improvement

Case	Before	After
Min	1.19 s	1.50 s
Max	2.98 s	2.03 s

Even with more calls, overall latency improved at peak load.

5. Conclusion

Key insight:

The problem wasn’t the LLM — it was how we were using the LLM.

One call doing all tasks → confusion, instability Proper division of roles → stable and predictable performance

Each component now focuses only on its job:

Role	Owner
Task Classification	Router
Parameter Parsing	Task-specific Parser
Validation	System Rules
Error Communication	LLM (Error Node)

This restructure marks a major milestone — transforming EVA Chat Agent into a trustworthy AI control interface.

A more robust foundation means:

Easier expansion
More accurate automation
Better user experience
Lower maintenance cost

Introduction​

1. Even simple requests are “multi-stage decision-making” for LLMs​

2. Fundamental issues in the original architecture​

2.1 Task rule conflicts​

2.2 Unreliable parameter parsing​

2.3 Every error case required manual rule definitions​

3. Introducing a Routing-Based Architecture​

3.1 Task Routing Node​

3.2 Task-Specific Parameter Parser​

3.3 Error Handling Node​

4. Performance Evaluation​

4.1 Task & Parameter Accuracy​

4.2 Prompt Length Reduction​

4.3 Latency Improvement​

5. Conclusion​