8 posts tagged with "Research"

AI 연구, 논문 기반 기술 인사이트와 실전 적용 사례를 다룹니다.

View All Tags

Performance Enhancement through Instruction Tuning Based on User Feedback Data

December 12, 2025 · 12 min read

Jaechan Lee

POSTECH

Yura Shin

AI Specialist

This work is a collaborative research effort with Minjoon Son (advised by Prof. Youngmyung Ko) as part of the "Campus: Beyond Safety to Intelligence – Postech Living Lab Project with EVA"

🎯 Introduction: Shifting Feedback from 'Retrospective Correction' to 'Cognitive Enhancement'

When EVA makes judgments based on images, operators often provide specific feedback like: "This is indeed a safety vest. Why did it get confused?" or "Shouldn't there be an alert here?" This feedback contains not just the right or wrong answer, but also the human reasoning and context behind the judgment.

Previously, EVA utilized this feedback by storing it in a separate Vector DB and using it to adjust the Alert status when similar situations occurred. While this approach offered the advantage of quick application, it had a structural limitation: it did not improve the model's intrinsic reasoning capability and merely retrospectively filtered errors.

To fundamentally address this issue, we completely changed our approach. We reconstructed user feedback not as simple error reports, but as Instruction Data that the model can directly use in its inference process to strengthen its Visual Reasoning capability.

This article will focus on how VLM-based Instruction Tuning utilizing user feedback data overcomes the limitations of the previous Vector DB-centric approach and improves the model's visual reasoning performance.

From Image to Language, From Language to Reasoning: Boosting VLM Performance with Camera Context

December 12, 2025 · 7 min read

Minjun Son

POSTECH

Jisu Kang

AI Specialist

This work is a collaborative research effort with Minjoon Son (advised by Prof. Youngmyung Ko) as part of the "Campus: Beyond Safety to Intelligence – Postech Living Lab Project with EVA"

📝 Introduction: Making User Queries Smarter: Enhancing Language with Image Context

EVA is a system that detects anomalies using hundreds to thousands of smart cameras. We utilized VLM/LLM to automatically infer the camera context and embedded this into the prompt, creating a camera-context aware anomaly detection pipeline that reflects the situation of the target image. By leveraging the camera context extracted from a single frame as prior knowledge for the VLLM, we confirmed a meaningful improvement in accuracy and deeper interpretability compared to the existing baseline.

Improving Performance of Intent-Based Chat Command Execution

December 3, 2025 · 4 min read

Yura Shin

AI Specialist

Introduction

Users simply send a sentence to a Chat Agent: “Please start monitoring.” “Set the threshold for people to 0.6.” “Add ‘tree’ to the target list.”

While the interaction appears simple, the internal processing required by the LLM is highly complex.

Before taking any action, the LLM must determine the intent:

“Is this a target-setting task? Scenario editing? Or just querying information?”

Then it must:

extract required parameters
validate values
handle errors gracefully and explain what's wrong

Previously, the system attempted to perform all of these steps in a single LLM call.

Although this looked clean on the surface, it repeatedly caused unpredictable and hard-to-debug problems:

Wrong task classification → wrong actions executed
Rule conflicts between different tasks
Incorrect parameter extraction without validation
Exponential growth in maintenance due to entangled rules

To solve these core issues, the Chat Agent was redesigned using a LangGraph-based Multi-Node Routing architecture.

1. Even simple requests are “multi-stage decision-making” for LLMs

The previous Chat Agent tried to interpret everything in one LLM call.

For example, the request:

“Change the threshold for ‘tree’ to 0.3”

Internally required the LLM to:

Identify the type of task
Extract parameters (“tree: 0.3”)
Validate the threshold value
Check configuration conflicts
Judge whether modification is allowed
Respond in natural language

Trying to combine all logic into a single prompt and a single set of rules resulted in:

Rules for one task affecting others
Parameter parsing failures
Small changes requiring full prompt rewrites
Hard-coded and exploding error handling logic

At peak, the prompt length reached 3,700 tokens, continuously growing and becoming fragile.

2. Fundamental issues in the original architecture

The original LLM call served five roles at once:

Task classification
Parameter parsing
Value validation
Error handling
Natural language generation

This caused multiple structural issues:

2.1 Task rule conflicts

Target labels must be in English for video detection. But this rule incorrectly applied to scenario descriptions too — forcing English output even for Korean text.

Result: rules interfering across unrelated tasks.

2.2 Unreliable parameter parsing

Even simple numeric interpretation often failed:

“one point five” → interpreted as 0.15
Word-form numbers or locale-dependent formats → parsing failures

More edge cases → more instability.

2.3 Every error case required manual rule definitions

The LLM handled all error evaluation. Meaning:

Every possible error had to be pre-defined
Any new parameter → new rules → high maintenance

3. Introducing a Routing-Based Architecture

We rebuilt the system using a 3-Stage LangGraph Routing Pipeline.

Core principle:

One purpose per LLM call. Never ask the LLM to do multiple jobs at once.

3.1 Task Routing Node

“Classify the request — and only that”

No parsing. No validation. No rule application.

Minimal responsibility → maximal reliability.

Uses:

Current request text
Available task list
Existing system state → to pick the correct task.

3.2 Task-Specific Parameter Parser

“Each task has isolated prompts, parsers, and rules”

Previously:

All tasks shared the same prompt → rule entanglement

Now:

Each task has its own prompt + parser + rules
Fully isolated LLM call

Examples:

Set-Target Task → dedicated logic only for targets
Start-Monitoring Task → independent logic only for monitoring

No more rule collisions or cross-contamination 🎯

3.3 Error Handling Node

“System validates. LLM explains.”

Process:

LLM extracts values
System Validator confirms correctness
If invalid → Error Node generates user-friendly explanation

Example (threshold 1.5):

Parser: threshold: 1.5
Validator: Out of allowed range
Error Node:

“Threshold must be between 0.0 and 1.0. Please try again.”

LLM no longer decides errors — it only communicates them.

4. Performance Evaluation

Routing-based design didn’t only improve accuracy — it boosted maintainability, stability, and speed.

4.1 Task & Parameter Accuracy

Metric	Before	After
Task Routing Accuracy	82.3%	95.0%
Parameter Parsing Accuracy	69.6%	95.0%

Huge gain thanks to isolating classification and parsing 🎉

4.2 Prompt Length Reduction

Case	Before	After
Min	1,603 tokens	1,106 tokens
Max	3,783 tokens	1,793 tokens

Shorter → more deterministic & reliable LLM reasoning

4.3 Latency Improvement

Case	Before	After
Min	1.19 s	1.50 s
Max	2.98 s	2.03 s

Even with more calls, overall latency improved at peak load.

5. Conclusion

Key insight:

The problem wasn’t the LLM — it was how we were using the LLM.

One call doing all tasks → confusion, instability Proper division of roles → stable and predictable performance

Each component now focuses only on its job:

Role	Owner
Task Classification	Router
Parameter Parsing	Task-specific Parser
Validation	System Rules
Error Communication	LLM (Error Node)

This restructure marks a major milestone — transforming EVA Chat Agent into a trustworthy AI control interface.

A more robust foundation means:

Easier expansion
More accurate automation
Better user experience
Lower maintenance cost

From One-Shot Decisions to Two-Stage Reasoning

December 2, 2025 · 7 min read

Seongwoo Kong

AI Specialist

Jisu Kang

AI Specialist

Keewon Jeong

Solution Architect

Instead of Making a Single Decision, Be Cautious Step-by-Step

The process of AI making a decision from a single camera image is more complex than most people think. Users may simply ask: “Notify me if someone falls down,” “Alert me when a worker isn’t wearing a mask,” But the AI has to: analyze the image, check the requested conditions, consider exceptions, make the final decision, and explain the reasoning — all in a single pass.

In EVA, we introduced an Enriched Input structure that separates the user’s requirements into Detection conditions and Exception conditions, which significantly improved performance. However, even with structured input, the AI still made contradictory judgments in multi-condition scenarios.

The issue was not only about structuring the conditions — but also about forcing the AI to perform multiple judgments all at once. So EVA moved beyond the limitations of the existing one-shot approach and introduced a new Two-Stage Reasoning process.

In this post, we cover:

Why structured input alone could not solve the problem
The fundamental limits of one-shot decision-making
Why AI works better when decisions are split into two stages
Performance improvements validated by real experiments

Turning Simple User Requests into AI-Understandable Instructions

November 28, 2025 · 11 min read

Seongwoo Kong

AI Specialist

Jisu Kang

AI Specialist

Keewon Jeong

Solution Architect

Expanding User Queries So AI Can Clearly Understand Intent

EVA is a system that operates based on user-issued commands. For EVA to make stable and accurate decisions, it is crucial that user requests are delivered in a form that AI can clearly understand.

However, even if the natural language expressions we use daily seem simple and clear to humans, they can be ambiguous from an AI model’s perspective, or they may require excessive implicit reasoning. This gap is exactly what often leads to AI system malfunctions or inaccurate decisions.

To fundamentally address this, EVA uses a Few-Shot prompting technique to automatically expand simple user requests into a structured query representation.

In this post, we focus on:

Why simple natural-language requests are difficult for AI
How query expansion can improve AI’s understanding
How much performance improved in actual field deployments

and share practical methods and their impact for helping AI understand user intent more clearly.

Complete Mastery of vLLM: Optimization for EVA

November 28, 2025 · 17 min read

Taehoon Park

AI Specialist

In this article, we will explore how we optimized LLM service in EVA. We will walk through the adoption of vLLM to serve LLMs tailored for EVA, along with explanations of the core serving techniques.

1. Why Efficient GPU Resource Utilization is Necessary

Most people initially interact with cloud-based LLMs such as GPT / Gemini / Claude. They deliver the best performance available without worrying about model operations — you simply need a URL and an API key. But API usage incurs continuous cost and data must be transmitted externally, introducing security risks for personal or internal corporate data. When usage scales up, a natural question arises:

“Wouldn’t it be better to just deploy the model on our own servers…?”

There are many local LLMs available such as Alibaba’s Qwen and Meta’s LLaMA. As the open-source landscape expands, newer high-performance models are being released at a rapid pace, and the choices are diverse. However, applying them to real services introduces several challenges.

Running an LLM as-is results in very slow inference. This is due to the autoregressive nature of modern LLMs. There are optimizations like KV Cache and Paged Attention that dramatically reduce inference time. Several open-source serving engines implement these ideas — EVA uses vLLM. Each engine differs in model support and ease of use. Let’s explore why EVA chose vLLM.

Eliminating False Positives in Human Detection Using Pose Estimation

November 27, 2025 · 6 min read

Euisuk Chung

AI Specialist

Introduction

“There’s a person over there!” Our AI vision system confidently reported. Yet all we saw on the screen was an empty chair with a coat draped over it.

Human detection technology has advanced rapidly, but the real world is far more chaotic than polished demo videos. In the environments we focus on, the problem becomes even more noticeable:

🏢 Office: empty chairs with jackets
🔬 Laboratory: lab coats and protective clothing hanging on chairs
💼 Work areas: vacant meeting rooms and lounges

Such false positives aren’t just “slightly wrong” results. They directly degrade system trust and efficiency.

For example:

Energy-saving systems may misjudge how many people are present and waste power.
Security systems may focus on “phantom personnel” and waste monitoring resources.

Example: an empty chair mistakenly detected as a seated human

Attention-Based Image-Guided Detection for Domain-Specific Object Recognition

November 11, 2025 · 5 min read

Hyunchan Moon

AI Specialist

Introduction: Practical Implementation of Image-Guided Detection

In the field of Open-Vocabulary Detection, OWL-v2 (Open-World Localization Vision Transformer v2) is a powerful model that can use both text and images as prompts. Particularly, Image-Guided Detection using "Visual Prompting" is a powerful feature that allows users to find desired objects with just example images.

This post shares 3 core optimization techniques that we applied while implementing OWL-v2's Image-Guided Detection methodology to fit production environments.

🎯 Introduction: Shifting Feedback from 'Retrospective Correction' to 'Cognitive Enhancement'​

📝 Introduction: Making User Queries Smarter: Enhancing Language with Image Context​

Introduction​

1. Even simple requests are “multi-stage decision-making” for LLMs​

2. Fundamental issues in the original architecture​

2.1 Task rule conflicts​

2.2 Unreliable parameter parsing​

2.3 Every error case required manual rule definitions​

3. Introducing a Routing-Based Architecture​

3.1 Task Routing Node​

3.2 Task-Specific Parameter Parser​

3.3 Error Handling Node​

4. Performance Evaluation​

4.1 Task & Parameter Accuracy​

4.2 Prompt Length Reduction​

4.3 Latency Improvement​

5. Conclusion​

Instead of Making a Single Decision, Be Cautious Step-by-Step​

Expanding User Queries So AI Can Clearly Understand Intent​

1. Why Efficient GPU Resource Utilization is Necessary​

Introduction​

Introduction: Practical Implementation of Image-Guided Detection​

🎯 Introduction: Shifting Feedback from 'Retrospective Correction' to 'Cognitive Enhancement'

📝 Introduction: Making User Queries Smarter: Enhancing Language with Image Context

Introduction

1. Even simple requests are “multi-stage decision-making” for LLMs

2. Fundamental issues in the original architecture

2.1 Task rule conflicts

2.2 Unreliable parameter parsing

2.3 Every error case required manual rule definitions

3. Introducing a Routing-Based Architecture

3.1 Task Routing Node

3.2 Task-Specific Parameter Parser

3.3 Error Handling Node

4. Performance Evaluation

4.1 Task & Parameter Accuracy

4.2 Prompt Length Reduction

4.3 Latency Improvement

5. Conclusion

Instead of Making a Single Decision, Be Cautious Step-by-Step

Expanding User Queries So AI Can Clearly Understand Intent

1. Why Efficient GPU Resource Utilization is Necessary

Introduction

Introduction: Practical Implementation of Image-Guided Detection