Skip to main content

The Future of AI Services Shown by OpenClaw

· 4 min read
Daniel Cho
Daniel Cho
Mellerikat Leader

Recently, OpenClaw has been generating significant buzz in the AI community. Running in local environments such as a Mac mini, this service interprets a user’s screen in real time and directly controls various applications — signaling an important shift in how we evaluate AI.

The competitive edge in AI is no longer defined by “how large or powerful a foundation model is,” but rather by “how effectively that model can perform complex tasks in real-world applications.”


A Paradigm Shift: From Performance to Execution

  • Old Paradigm: “How intelligent is it?” Until now, the AI industry has focused heavily on the scale and performance of foundation models. Large language models such as GPT-4, Claude, and Gemini competed on parameters, dataset size, and benchmark scores. The central question was: “How smart is the AI?”

  • New Paradigm: “How much work can it actually perform?” OpenClaw introduces a fundamentally different question: “How effectively can the model perform complex tasks in real-world environments?” AI value is no longer measured by raw intelligence alone, but by its ability to execute within real computing environments.

Teaching VLMs to Multitask: Enhancing Situation Awareness through Scenario Decomposition

· 8 min read
Hyunchan Moon
Hyunchan Moon
AI Specialist

At the core of EVA lies the ability to truly understand critical situations that occur simultaneously within a single scene—such as fires, people falling, or traffic accidents—without missing any of them. However, no matter how capable a Vision-Language Model (VLM) is, asking it to reason about too many things at once leads to a sharp degradation in cognitive performance.[2,3]

In this post, inspired by the recent text-to-video retrieval research Q₂E (Query-to-Event Decomposition)[1], we introduce Scenario Decomposition, a technique that enables VLMs to deeply understand complex, multi-scenario situations within a single frame.

Physical AI Implemented with EVA

· 3 min read
Gyulim Gu
Gyulim Gu
Tech Leader

When Can AI Intervene in the Real World?

Accidents in industrial environments happen without warning. Moments such as a worker collapsing, an arm getting caught in machinery, or a fire breaking out usually occur within seconds.

Physical AI should not stop at recognizing these moments. It must be capable of translating perception into physical action on site.

In this post, we walk through a LEGO-based simulation to show how EVA detects incidents and how its decisions are connected to real equipment actions as a single, continuous flow.




Simplifying Industrial Scenarios with LEGO

Instead of replicating complex industrial environments in full detail, we simplified accident scenarios using LEGO.

We designed independent scenarios for:

  • a worker collapsing,
  • an arm being caught in equipment,
  • and a fire breaking out.

Arm caught in equipment – conveyor belt stops and warning light activates

EVA: A New Standard for Safety Management Beyond Physical Sensors

· 3 min read
Daniel Cho
Daniel Cho
Mellerikat Leader

EVA Accelerates the Golden Time for Fire Response

Securing the “golden time” during a fire incident in manufacturing facilities is one of the most critical factors in protecting both human life and physical assets. Traditional fire detection systems have long relied on physical sensors, but camera-based intelligent detection technologies are now rapidly replacing this role.

In this post, we analyze EVA’s smoke detection performance through a real-world validation test conducted at an LG Electronics facility and examine the technical significance of the results.




Field Validation Test: 8 Seconds vs. 38 Seconds

A smoke detection test simulating a real fire scenario was conducted at an LG Electronics production site. The core objective of this test was to compare the detection speed between the existing smoke detectors and the newly introduced EVA system.

The results were highly encouraging. Based on the moment when smoke began to rise, the average response times of each system were as follows:

EVA: Smoke detected approximately 8 seconds after occurrence

Conventional smoke detector: Smoke detected approximately 38 seconds after occurrence

As a result, EVA identified and propagated the hazardous situation more than four times faster than conventional smoke detectors. This 30-second difference represents a decisive window that can determine the success or failure of initial fire suppression.

The Synergy of EVA and Workflow Builder

· 6 min read
Gyulim Gu
Gyulim Gu
Tech Leader

Beyond Observation: AI That Takes Action

The core challenge for AI today is no longer just analyzing data or describing scenes. A truly intelligent system must be able to drive meaningful actions in the physical world or corporate operational systems based on its analysis.

EVA is now moving beyond the role of 'eyes' and 'brain' that perceive visual information and judge situations, to join with the 'hands'—the Workflow Builder. This marks the completion of an End-to-End automation structure that moves past passive, notification-centric monitoring to independently judging site conditions and solving problems.


Performance Enhancement through Instruction Tuning Based on User Feedback Data

· 12 min read
Jaechan Lee
Jaechan Lee
POSTECH
Yura Shin
Yura Shin
AI Specialist

This work is a collaborative research effort with Minjoon Son (advised by Prof. Youngmyung Ko) as part of the "Campus: Beyond Safety to Intelligence – Postech Living Lab Project with EVA"


🎯 Introduction: Shifting Feedback from 'Retrospective Correction' to 'Cognitive Enhancement'

When EVA makes judgments based on images, operators often provide specific feedback like: "This is indeed a safety vest. Why did it get confused?" or "Shouldn't there be an alert here?" This feedback contains not just the right or wrong answer, but also the human reasoning and context behind the judgment.

Previously, EVA utilized this feedback by storing it in a separate Vector DB and using it to adjust the Alert status when similar situations occurred. While this approach offered the advantage of quick application, it had a structural limitation: it did not improve the model's intrinsic reasoning capability and merely retrospectively filtered errors.

To fundamentally address this issue, we completely changed our approach. We reconstructed user feedback not as simple error reports, but as Instruction Data that the model can directly use in its inference process to strengthen its Visual Reasoning capability.

This article will focus on how VLM-based Instruction Tuning utilizing user feedback data overcomes the limitations of the previous Vector DB-centric approach and improves the model's visual reasoning performance.

From Image to Language, From Language to Reasoning: Boosting VLM Performance with Camera Context

· 7 min read
Minjun Son
Minjun Son
POSTECH
Jisu Kang
Jisu Kang
AI Specialist

This work is a collaborative research effort with Minjoon Son (advised by Prof. Youngmyung Ko) as part of the "Campus: Beyond Safety to Intelligence – Postech Living Lab Project with EVA"


📝 Introduction: Making User Queries Smarter: Enhancing Language with Image Context

EVA is a system that detects anomalies using hundreds to thousands of smart cameras. We utilized VLM/LLM to automatically infer the camera context and embedded this into the prompt, creating a camera-context aware anomaly detection pipeline that reflects the situation of the target image. By leveraging the camera context extracted from a single frame as prior knowledge for the VLLM, we confirmed a meaningful improvement in accuracy and deeper interpretability compared to the existing baseline.

Improving Performance of Intent-Based Chat Command Execution

· 4 min read
Yura Shin
Yura Shin
AI Specialist

Introduction

Users simply send a sentence to a Chat Agent: “Please start monitoring.” “Set the threshold for people to 0.6.” “Add ‘tree’ to the target list.”

While the interaction appears simple, the internal processing required by the LLM is highly complex.

Before taking any action, the LLM must determine the intent:

“Is this a target-setting task? Scenario editing? Or just querying information?”

Then it must:

  • extract required parameters
  • validate values
  • handle errors gracefully and explain what's wrong

Previously, the system attempted to perform all of these steps in a single LLM call.

Although this looked clean on the surface, it repeatedly caused unpredictable and hard-to-debug problems:

  • Wrong task classification → wrong actions executed
  • Rule conflicts between different tasks
  • Incorrect parameter extraction without validation
  • Exponential growth in maintenance due to entangled rules

To solve these core issues, the Chat Agent was redesigned using a LangGraph-based Multi-Node Routing architecture.




1. Even simple requests are “multi-stage decision-making” for LLMs

The previous Chat Agent tried to interpret everything in one LLM call.

For example, the request:

“Change the threshold for ‘tree’ to 0.3”

Internally required the LLM to:

  1. Identify the type of task
  2. Extract parameters (“tree: 0.3”)
  3. Validate the threshold value
  4. Check configuration conflicts
  5. Judge whether modification is allowed
  6. Respond in natural language

Trying to combine all logic into a single prompt and a single set of rules resulted in:

  • Rules for one task affecting others
  • Parameter parsing failures
  • Small changes requiring full prompt rewrites
  • Hard-coded and exploding error handling logic

At peak, the prompt length reached 3,700 tokens, continuously growing and becoming fragile.




2. Fundamental issues in the original architecture

The original LLM call served five roles at once:

  • Task classification
  • Parameter parsing
  • Value validation
  • Error handling
  • Natural language generation

This caused multiple structural issues:


2.1 Task rule conflicts

Target labels must be in English for video detection. But this rule incorrectly applied to scenario descriptions too — forcing English output even for Korean text.

Result: rules interfering across unrelated tasks.


2.2 Unreliable parameter parsing

Even simple numeric interpretation often failed:

  • “one point five” → interpreted as 0.15
  • Word-form numbers or locale-dependent formats → parsing failures

More edge cases → more instability.


2.3 Every error case required manual rule definitions

The LLM handled all error evaluation. Meaning:

  • Every possible error had to be pre-defined
  • Any new parameter → new rules → high maintenance



3. Introducing a Routing-Based Architecture

We rebuilt the system using a 3-Stage LangGraph Routing Pipeline.

Core principle:

One purpose per LLM call. Never ask the LLM to do multiple jobs at once.


3.1 Task Routing Node

“Classify the request — and only that”

No parsing. No validation. No rule application.

Minimal responsibility → maximal reliability.

Uses:

  • Current request text
  • Available task list
  • Existing system state → to pick the correct task.

3.2 Task-Specific Parameter Parser

“Each task has isolated prompts, parsers, and rules”

Previously:

  • All tasks shared the same prompt → rule entanglement

Now:

  • Each task has its own prompt + parser + rules
  • Fully isolated LLM call

Examples:

  • Set-Target Task → dedicated logic only for targets
  • Start-Monitoring Task → independent logic only for monitoring

No more rule collisions or cross-contamination 🎯


3.3 Error Handling Node

“System validates. LLM explains.”

Process:

  • LLM extracts values
  • System Validator confirms correctness
  • If invalid → Error Node generates user-friendly explanation

Example (threshold 1.5):

  • Parser: threshold: 1.5
  • Validator: Out of allowed range
  • Error Node:

    “Threshold must be between 0.0 and 1.0. Please try again.”

LLM no longer decides errors — it only communicates them.




4. Performance Evaluation

Routing-based design didn’t only improve accuracy — it boosted maintainability, stability, and speed.


4.1 Task & Parameter Accuracy

MetricBeforeAfter
Task Routing Accuracy82.3%95.0%
Parameter Parsing Accuracy69.6%95.0%

Huge gain thanks to isolating classification and parsing 🎉


4.2 Prompt Length Reduction

CaseBeforeAfter
Min1,603 tokens1,106 tokens
Max3,783 tokens1,793 tokens

Shorter → more deterministic & reliable LLM reasoning


4.3 Latency Improvement

CaseBeforeAfter
Min1.19 s1.50 s
Max2.98 s2.03 s

Even with more calls, overall latency improved at peak load.




5. Conclusion

Key insight:

The problem wasn’t the LLM — it was how we were using the LLM.

One call doing all tasks → confusion, instability Proper division of roles → stable and predictable performance

Each component now focuses only on its job:

RoleOwner
Task ClassificationRouter
Parameter ParsingTask-specific Parser
ValidationSystem Rules
Error CommunicationLLM (Error Node)

This restructure marks a major milestone — transforming EVA Chat Agent into a trustworthy AI control interface.

A more robust foundation means:

  • Easier expansion
  • More accurate automation
  • Better user experience
  • Lower maintenance cost

From One-Shot Decisions to Two-Stage Reasoning

· 7 min read
Seongwoo Kong
Seongwoo Kong
AI Specialist
Jisu Kang
Jisu Kang
AI Specialist
Keewon Jeong
Keewon Jeong
Solution Architect

Instead of Making a Single Decision, Be Cautious Step-by-Step

The process of AI making a decision from a single camera image is more complex than most people think. Users may simply ask: “Notify me if someone falls down,” “Alert me when a worker isn’t wearing a mask,” But the AI has to: analyze the image, check the requested conditions, consider exceptions, make the final decision, and explain the reasoning — all in a single pass.

In EVA, we introduced an Enriched Input structure that separates the user’s requirements into Detection conditions and Exception conditions, which significantly improved performance. However, even with structured input, the AI still made contradictory judgments in multi-condition scenarios.

The issue was not only about structuring the conditions — but also about forcing the AI to perform multiple judgments all at once. So EVA moved beyond the limitations of the existing one-shot approach and introduced a new Two-Stage Reasoning process.

In this post, we cover:

  • Why structured input alone could not solve the problem
  • The fundamental limits of one-shot decision-making
  • Why AI works better when decisions are split into two stages
  • Performance improvements validated by real experiments

Turning Simple User Requests into AI-Understandable Instructions

· 11 min read
Seongwoo Kong
Seongwoo Kong
AI Specialist
Jisu Kang
Jisu Kang
AI Specialist
Keewon Jeong
Keewon Jeong
Solution Architect

Expanding User Queries So AI Can Clearly Understand Intent

EVA is a system that operates based on user-issued commands. For EVA to make stable and accurate decisions, it is crucial that user requests are delivered in a form that AI can clearly understand.

However, even if the natural language expressions we use daily seem simple and clear to humans, they can be ambiguous from an AI model’s perspective, or they may require excessive implicit reasoning. This gap is exactly what often leads to AI system malfunctions or inaccurate decisions.

To fundamentally address this, EVA uses a Few-Shot prompting technique to automatically expand simple user requests into a structured query representation.

In this post, we focus on:

  • Why simple natural-language requests are difficult for AI
  • How query expansion can improve AI’s understanding
  • How much performance improved in actual field deployments

and share practical methods and their impact for helping AI understand user intent more clearly.