Skip to main content

3 posts tagged with "VLM"

Vision-Language Model 기반 기술과 응용 사례를 다룹니다.

View All Tags

Turning Simple User Requests into AI-Understandable Instructions

· 11 min read
Seongwoo Kong
Seongwoo Kong
AI Specialist
Jisu Kang
Jisu Kang
AI Specialist
Keewon Jeong
Keewon Jeong
Solution Architect

Expanding User Queries So AI Can Clearly Understand Intent

EVA is a system that operates based on user-issued commands. For EVA to make stable and accurate decisions, it is crucial that user requests are delivered in a form that AI can clearly understand.

However, even if the natural language expressions we use daily seem simple and clear to humans, they can be ambiguous from an AI model’s perspective, or they may require excessive implicit reasoning. This gap is exactly what often leads to AI system malfunctions or inaccurate decisions.

To fundamentally address this, EVA uses a Few-Shot prompting technique to automatically expand simple user requests into a structured query representation.

In this post, we focus on:

  • Why simple natural-language requests are difficult for AI
  • How query expansion can improve AI’s understanding
  • How much performance improved in actual field deployments

and share practical methods and their impact for helping AI understand user intent more clearly.

Complete Mastery of vLLM: Optimization for EVA

· 17 min read
Taehoon Park
Taehoon Park
AI Specialist

In this article, we will explore how we optimized LLM service in EVA. We will walk through the adoption of vLLM to serve LLMs tailored for EVA, along with explanations of the core serving techniques.




1. Why Efficient GPU Resource Utilization is Necessary

Most people initially interact with cloud-based LLMs such as GPT / Gemini / Claude. They deliver the best performance available without worrying about model operations — you simply need a URL and an API key. But API usage incurs continuous cost and data must be transmitted externally, introducing security risks for personal or internal corporate data. When usage scales up, a natural question arises:

“Wouldn’t it be better to just deploy the model on our own servers…?”

There are many local LLMs available such as Alibaba’s Qwen and Meta’s LLaMA. As the open-source landscape expands, newer high-performance models are being released at a rapid pace, and the choices are diverse. However, applying them to real services introduces several challenges.

Running an LLM as-is results in very slow inference. This is due to the autoregressive nature of modern LLMs. There are optimizations like KV Cache and Paged Attention that dramatically reduce inference time. Several open-source serving engines implement these ideas — EVA uses vLLM. Each engine differs in model support and ease of use. Let’s explore why EVA chose vLLM.

Advancing the Lightweight Model-Based Scenario Detection Agent

· 9 min read
Seongwoo Kong
Seongwoo Kong
AI Specialist
Jisu Kang
Jisu Kang
AI Specialist
Keewon Jeong
Keewon Jeong
Solution Architect

Detection과 Exception을 분리해 알람 품질을 끌어올린 실전 아키텍처

CCTV 안전 모니터링, 왜 2-Step Inference가 답이었나
사고는 놓치지 않되, 쓸데없는 알람은 줄이기.

리소스 효율성과 실시간 응답성을 고려하여, Qwen2.5-32B 대신 더 가벼우면서도 경쟁력 있는 모델 아키텍처로 전환하고 2-Step Inference 방식을 도입하였습니다. EVA Beta Test시, Qwen3 8B 모델은 Qwen2.5 32B 모델보다 전반적인 추론 능력은 더 뛰어났지만, 한 번에 여러 가지 답변을 일관성 있게 생성하는 데에는 어려움이 있었고, 사용자 언어에 맞춰 alert response를 출력하는 태스크에서도 한계를 보였습니다.

예를 들어, 실제로는 alert가 True인 상황인데도, 모델이 생성한 alert response에서는 마치 alert가 False인 경우처럼 서술하는 식의 모순이 발생하곤 했습니다. 저희는 이런 식의 long-context inference에 한계가 있는 8B 모델을 더 효과적으로 활용하기 위한 방법으로 2-Step Inference를 설계했습니다.

따라서 본 포스트는 2-Step Inference 아키텍처에 초점을 맞추어, 기존 1-Step Inference의 한계2-Step Inference로 Detection과 Exception 판단을 분리했을 때 precision / recall 트레이드오프가 어떻게 변화했는지를 중심으로 정리했습니다.