Skip to main content

Complete Mastery of vLLM: Optimization for EVA

· 17 min read
Taehoon Park
Taehoon Park
AI Specialist

In this article, we will explore how we optimized LLM service in EVA. We will walk through the adoption of vLLM to serve LLMs tailored for EVA, along with explanations of the core serving techniques.




1. Why Efficient GPU Resource Utilization is Necessary

Most people initially interact with cloud-based LLMs such as GPT / Gemini / Claude. They deliver the best performance available without worrying about model operations — you simply need a URL and an API key. But API usage incurs continuous cost and data must be transmitted externally, introducing security risks for personal or internal corporate data. When usage scales up, a natural question arises:

“Wouldn’t it be better to just deploy the model on our own servers…?”

There are many local LLMs available such as Alibaba’s Qwen and Meta’s LLaMA. As the open-source landscape expands, newer high-performance models are being released at a rapid pace, and the choices are diverse. However, applying them to real services introduces several challenges.

Running an LLM as-is results in very slow inference. This is due to the autoregressive nature of modern LLMs. There are optimizations like KV Cache and Paged Attention that dramatically reduce inference time. Several open-source serving engines implement these ideas — EVA uses vLLM. Each engine differs in model support and ease of use. Let’s explore why EVA chose vLLM.

Eliminating False Positives in Human Detection Using Pose Estimation

· 6 min read
Euisuk Chung
Euisuk Chung
AI Specialist

Introduction

“There’s a person over there!” Our AI vision system confidently reported. Yet all we saw on the screen was an empty chair with a coat draped over it.

Human detection technology has advanced rapidly, but the real world is far more chaotic than polished demo videos. In the environments we focus on, the problem becomes even more noticeable:

  • 🏢 Office: empty chairs with jackets
  • 🔬 Laboratory: lab coats and protective clothing hanging on chairs
  • 💼 Work areas: vacant meeting rooms and lounges

Such false positives aren’t just “slightly wrong” results. They directly degrade system trust and efficiency.

For example:

  • Energy-saving systems may misjudge how many people are present and waste power.
  • Security systems may focus on “phantom personnel” and waste monitoring resources.

Example: an empty chair mistakenly detected as a seated human

PoV on Physical AI

· 6 min read
Daniel Cho
Daniel Cho
Mellerikat Leader

Beyond Robot AI...

The concept of Physical AI is often equated with robotic technology. Many envision a future where robots freely navigate spaces and perform tasks on behalf of humans. However, the reality is that it will take considerable time until technology reaches that level. Despite this, much of the current discussion around Physical AI remains robot-centric — which is limiting.

Physical AI does not need to exist solely in the form of a robot. There are already a wide variety of interfaces in our physical world that can interact with AI.

Attention-Based Image-Guided Detection for Domain-Specific Object Recognition

· 5 min read
Hyunchan Moon
Hyunchan Moon
AI Specialist

Introduction: Practical Implementation of Image-Guided Detection

In the field of Open-Vocabulary Detection, OWL-v2 (Open-World Localization Vision Transformer v2) is a powerful model that can use both text and images as prompts. Particularly, Image-Guided Detection using "Visual Prompting" is a powerful feature that allows users to find desired objects with just example images.

This post shares 3 core optimization techniques that we applied while implementing OWL-v2's Image-Guided Detection methodology to fit production environments.

Meta-Intelligence of LLM Observability

· 3 min read
Daniel Cho
Daniel Cho
Mellerikat Leader

The Evolution of Observability into Meta-Intelligence in LLMOps

To effectively implement LLM services, a robust LLMOps framework is essential. Among its components, observability (o11y) has evolved beyond simple monitoring to become a critical enabler of the system’s meta-intelligence.





The Evolution of o11y into Meta-Intelligence

Early LLM o11y focused on collecting metrics such as token usage, response time, response content, and user feedback to monitor performance. We adopted Langsmith, a commercial tool, to monitor the execution process of AI logic. Later, we integrated Langfuse, an open-source tool, allowing our organization to selectively use either tool based on licensing requirements.

However, as the number of AI Agent service users grew, it became clear that accumulated data could no longer provide meaningful insights through simple log analysis. Consequently, we decided to transform o11y data from mere "observation logs" into a meta-intelligence tool. This system leverages AI Agent outputs and user feedback to automatically reformulate questions or enhance response quality by adjusting model behavior.

In essence, o11y data transcends real-time performance monitoring to become the cornerstone of a feedback loop that enables AI Agents to self-improve.

Academically, this approach aligns with the growing focus on AgentOps or Agentic AI observation systems. There is a movement to propose comprehensive observation frameworks for AgentOps, tracking various artifacts such as execution paths, internal logic, tool calls, and planning stages. Beyond black-box evaluations, the importance of inferring and optimizing behavioral patterns based on agent execution logs is increasingly emphasized.

Next-Gen Camera - EVA x Meraki

· 6 min read
Daniel Cho
Daniel Cho
Mellerikat Leader

Background

Meraki’s Cloud-Managed Service already boasts an exceptional infrastructure. If a variety of third-party apps, particularly AI-based services, could seamlessly integrate with this cloud platform, the true potential for enhancing Meraki’s value could be realized.

Currently, Meraki Cloud includes an App Store with some available apps, but it faces clear limitations:

  • Integration with Meraki Cloud Services
    • App installation and deployment are restricted. Only select partners can officially register apps, and the installation process is complex or not automated.
    • Third-party apps are not fully integrated with the Meraki Dashboard, leading to fragmented user experiences or dispersed management points.
    • Limitations in APIs and SDKs hinder sufficient integration and scalability with external services.

Upgrading Meraki Cloud to the next level and establishing best practices for a third-party app ecosystem make the integration of Meraki Smart Camera with mellerikat EVA a highly significant case study.

Gen AI and Domain-Specific AI

· 4 min read
Daniel Cho
Daniel Cho
Mellerikat Leader

Specialized Intelligence: The Key to Business Innovation Beyond General Intelligence

Since the digital revolution, artificial intelligence (AI) has rapidly advanced, bringing transformative changes to our daily lives and industries. The emergence of Generative AI (Gen AI) has made AI technology accessible to everyone, but it has also introduced various challenges. While a universal AI capable of excelling in all domains is an ideal goal, in reality, specialized intelligence tailored to specific fields often delivers greater value.