Infrastructure Optimization for Supporting Large-Scale Camera Environments in EVA

June 21, 2026 · 9 min read

Gyulim Gu

Tech Leader

Danbi Lee

Product Leader

EVA has evolved into an architecture that efficiently utilizes overall server resources, rather than relying solely on GPU performance, to provide AI services for more than 100 cameras on a single server.

In large-scale camera environments, simply using a higher-performance GPU is not enough. The system must maintain stable streaming for all cameras while processing AI inference requests from multiple cameras within limited CPU, memory, network, and GPU resources.

If the AI pipeline becomes biased toward a specific camera or model, some cameras may not be analyzed properly, or the delay between event occurrence and alarm delivery may increase. For this reason, EVA optimizes its infrastructure across the entire pipeline, from video ingestion and AI inference to streaming delivery.

In this article, we introduce the key infrastructure optimization technologies EVA applies to reliably support large-scale camera environments.

1. VM·VLM Separated Architecture

EVA does not analyze every video frame with a high-cost VLM. Instead, a VM(Vision Model) first checks whether the target object exists and whether basic conditions are met. Only when further reasoning is required does EVA run the VLM(Vision Language Model).

For example, in a PPE non-compliance detection scenario, frames without a person or frames unrelated to the target condition are filtered out at an early stage. VLM inference is performed only when a person is detected and the system needs to determine whether protective equipment is being worn.

Item	Description
Optimization Target	GPU
Key Approach	Use VM-based filtering first and run VLM only when needed
Effect	Improved camera capacity on the same server from around 20 cameras to up to 100 cameras

In the initial architecture, GPU computation increased rapidly because every frame was processed mainly through the VLM. In the current architecture, the VM first filters the targets that require further analysis, significantly reducing the number of VLM calls.

With this structure, EVA improved the number of cameras that can be supported on the same server from around 20 to up to 100. This figure is based on internal validation comparing the initial VLM-centric structure with the current architecture, where VM filtering is applied before invoking the VLM only when necessary.

2. Task Decomposition and GPU Parallel Processing

EVA does not process inference requests from multiple cameras as one large task. Instead, it decomposes the scenario reasoning process into smaller tasks such as detection step evaluation, exception checking, image description generation, and vectorization.

These decomposed tasks are distributed and processed in parallel across multiple inference instances. This design prevents any single task from blocking the entire pipeline when requests occur simultaneously from many cameras.

Item	Description
Optimization Target	GPU
Key Approach	Decompose scenario reasoning into smaller tasks and process them in parallel
Effect	Improved scenario throughput by more than 3x, from 360 to 1,192 scenarios per hour

The key idea behind this approach is to terminate unnecessary inference early. If an early step determines that the alarm condition is not met, EVA does not proceed with additional VLM calls, image description generation, or vectorization.

As a result, GPU resources can be focused only on tasks that require actual reasoning. In real operating environments, this improved scenario throughput from 360 to 1,192 scenarios per hour, more than a 3x increase.

3. Inference Optimization Based on Detection Modes

Processing every scenario in the same way increases unnecessary GPU usage. EVA analyzes the characteristics of each user-defined scenario and automatically selects the most appropriate detection method.

Scenarios that only require simple object presence checks are handled mainly by the VM, while scenarios that require complex contextual reasoning are processed using VLM-based inference.

Detection Mode	Main Role
Simple Mode	Detects simple object presence such as people, vehicles, or equipment using the VM
Default Mode	Separates various scenarios into detection steps and exception conditions for multi-step reasoning
PPE Mode	Precisely checks whether protective equipment is worn at the worker level
Thinking Mode	Performs context-based reasoning for complex situations such as fire, falling, or risky behavior

For example, a scenario such as “Notify me when a person is visible” does not require a high-cost VLM and can be processed with the VM alone. On the other hand, a scenario such as “Notify me when a worker enters a dangerous area without wearing a safety helmet” requires additional reasoning because it must evaluate object presence, PPE status, and spatial context together.

By using only the level of model required for each scenario, EVA reduces GPU usage while maintaining detection performance.

4. Dynamic Worker Allocation by Model

EVA dynamically allocates Workers based on the models in use and the number of cameras assigned to each model.

For example, if many cameras are using OMDet at a certain point in time, EVA assigns more Workers to that model. Models with lower usage are kept with minimal resources. This prevents GPU resources from being overly concentrated on a specific model and helps maintain stable overall GPU utilization.

Item	Description
Optimization Target	GPU, Memory
Key Approach	Adjust the number of Workers based on camera count and request volume by model
Operating Metric	Maintains frame drop rate within 10% in an internal load test with 100 cameras

If Workers are allocated statically, a model may occupy resources regardless of actual usage, or a heavily used model may not have enough processing capacity.

EVA adjusts the number of Workers based on request volume by model, keeping the rate of AI inference target frames missed due to processing delay within 10%. This figure is an operating metric managed under an internal load test environment with 100 cameras and simultaneous AI inference requests.

5. Priority Queue-Based Request Scheduling

In large-scale camera environments, certain cameras may temporarily generate a large number of requests. If requests are processed in a simple FIFO order, one camera may occupy excessive system resources, delaying inference for other cameras.

EVA manages inference requests by camera using a priority queue so that all cameras can access the AI pipeline fairly.

Scheduling Criteria	Processing Method
Cameras with fewer processed requests	Processed first
Requests under the same condition	Processed in arrival order
When the queue is full	Older requests from cameras with excessive accumulated requests are removed first
Frames that are no longer timely	Excluded from subsequent inference

This structure prevents inference for other cameras from being delayed even when requests temporarily spike from a specific camera. It also prevents outdated frames from being processed too late and causing incorrect alarms, helping maintain the stability of the real-time AI pipeline.

6. Dynamic FPS Control Based on Operating State

EVA dynamically adjusts video collection and processing frequency based on each camera’s operating state.

A simple connection state, an AI inference state, and a live streaming state all require different levels of processing. If all cameras are processed at the same FPS, unnecessary CPU, GPU, and network usage increases.

Item	Description
Optimization Target	CPU, GPU, Network
Key Approach	Dynamically adjust collection, analysis, and streaming FPS based on camera operating state
Effect	Reduced CPU usage by approximately 35% when 50 cameras were connected in Monitoring On state

For example, in a simple connection state, EVA collects only the minimum number of frames needed to maintain the connection. When AI inference is required, only the frames needed for analysis are selected and processed. Higher-FPS streaming is performed only when a user is watching the live video.

In particular, EVA does not decode every collected frame or use every frame for AI analysis. It selectively extracts only the frames needed for inference, minimizing unnecessary video collection, decoding, preprocessing, and inference operations.

Internal validation showed that, under the same server specifications, CPU usage for 50 cameras in Monitoring On state was reduced from around 100% to around 65%, resulting in an approximately 35% reduction in CPU resource usage.

7. Multi-User Streaming Optimization

In monitoring environments, multiple users often watch the same camera stream at the same time.

In a conventional structure, as the number of users increases, decoding, preprocessing, and encoding may be repeatedly performed for the same video stream, significantly increasing CPU and memory usage. EVA reduces this overhead by sharing video processing results for the same camera.

Item	Description
Optimization Target	CPU, Memory, Network
Key Approach	Decode, preprocess, and encode the same camera stream once, then share it with multiple users
Effect	Prevents CPU and memory usage from increasing linearly with the number of users

EVA performs decoding, preprocessing, and encoding only once for the same camera, then shares the generated stream with multiple users. As a result, even when the number of users increases, the additional cost is mainly limited to network transmission.

This allows EVA to maintain stable streaming quality while minimizing server resource usage, even when many users monitor video streams simultaneously.

Closing

EVA’s infrastructure efficiency does not depend solely on single-GPU performance. It is the result of optimizing the entire pipeline, including video ingestion, frame selection, AI inference, request scheduling, and streaming delivery.

EVA combines the following technologies to reliably support more cameras on the same server environment.

Optimization Technology	Main Effect
VM·VLM Separation	Minimizes high-cost VLM calls
Task Decomposition and Parallel Processing	Improves scenario throughput by more than 3x
Detection Mode-Based Optimization	Uses models according to scenario characteristics
Dynamic Worker Allocation	Optimizes GPU and memory usage based on request volume by model
Priority Queue	Provides fair inference opportunities across cameras
Dynamic FPS Control	Reduces CPU, GPU, and network usage
Multi-User Streaming Optimization	Eliminates redundant decoding and encoding

Ultimately, EVA’s large-scale camera capacity is not achieved by a single optimization technique. It is the result of designing the AI model structure and system infrastructure together to use limited server resources more efficiently.

EVA will continue to enhance its infrastructure optimization technologies to reliably support more cameras, more complex scenarios, and more diverse operating environments.

Infrastructure Optimization for Supporting Large-Scale Camera Environments in EVA

1. VM·VLM Separated Architecture

2. Task Decomposition and GPU Parallel Processing

3. Inference Optimization Based on Detection Modes

4. Dynamic Worker Allocation by Model

5. Priority Queue-Based Request Scheduling

6. Dynamic FPS Control Based on Operating State

7. Multi-User Streaming Optimization

Closing

다음 내용 읽기

EVA on Rebellions NPU: An Optimization Journey for Physical AI Services

Optimizing Detection Operations with Meta Agent

Thinking Mode for More Accurate Detection of Immediately Discernible Hazards

Start Intellectually Monitoring Your Site with EVA

No complex hardware setups. Just connect your cameras and begin.

1. VM·VLM Separated Architecture​

2. Task Decomposition and GPU Parallel Processing​

3. Inference Optimization Based on Detection Modes​

4. Dynamic Worker Allocation by Model​

5. Priority Queue-Based Request Scheduling​

6. Dynamic FPS Control Based on Operating State​

7. Multi-User Streaming Optimization​

Closing​

다음 내용 읽기

EVA on Rebellions NPU: An Optimization Journey for Physical AI Services

Optimizing Detection Operations with Meta Agent

Thinking Mode for More Accurate Detection of Immediately Discernible Hazards

Start Intellectually Monitoring Your Site with EVA

No complex hardware setups. Just connect your cameras and begin.

1. VM·VLM Separated Architecture

2. Task Decomposition and GPU Parallel Processing

3. Inference Optimization Based on Detection Modes

4. Dynamic Worker Allocation by Model

5. Priority Queue-Based Request Scheduling

6. Dynamic FPS Control Based on Operating State

7. Multi-User Streaming Optimization

Closing