Skip to main content

Infrastructure Optimization for Supporting Large-Scale Camera Environments in EVA

· 9 min read
Gyulim Gu
Gyulim Gu
Tech Leader
Danbi Lee
Danbi Lee
Product Leader

EVA has evolved into an architecture that efficiently utilizes overall server resources, rather than relying solely on GPU performance, to provide AI services for more than 100 cameras on a single server.

In large-scale camera environments, simply using a higher-performance GPU is not enough. The system must maintain stable streaming for all cameras while processing AI inference requests from multiple cameras within limited CPU, memory, network, and GPU resources.

If the AI pipeline becomes biased toward a specific camera or model, some cameras may not be analyzed properly, or the delay between event occurrence and alarm delivery may increase. For this reason, EVA optimizes its infrastructure across the entire pipeline, from video ingestion and AI inference to streaming delivery.

In this article, we introduce the key infrastructure optimization technologies EVA applies to reliably support large-scale camera environments.




1. VM·VLM Separated Architecture

EVA does not analyze every video frame with a high-cost VLM. Instead, a VM(Vision Model) first checks whether the target object exists and whether basic conditions are met. Only when further reasoning is required does EVA run the VLM(Vision Language Model).

For example, in a PPE non-compliance detection scenario, frames without a person or frames unrelated to the target condition are filtered out at an early stage. VLM inference is performed only when a person is detected and the system needs to determine whether protective equipment is being worn.

ItemDescription
Optimization TargetGPU
Key ApproachUse VM-based filtering first and run VLM only when needed
EffectImproved camera capacity on the same server from around 20 cameras to up to 100 cameras

In the initial architecture, GPU computation increased rapidly because every frame was processed mainly through the VLM. In the current architecture, the VM first filters the targets that require further analysis, significantly reducing the number of VLM calls.

With this structure, EVA improved the number of cameras that can be supported on the same server from around 20 to up to 100. This figure is based on internal validation comparing the initial VLM-centric structure with the current architecture, where VM filtering is applied before invoking the VLM only when necessary.




2. Task Decomposition and GPU Parallel Processing

EVA does not process inference requests from multiple cameras as one large task. Instead, it decomposes the scenario reasoning process into smaller tasks such as detection step evaluation, exception checking, image description generation, and vectorization.

These decomposed tasks are distributed and processed in parallel across multiple inference instances. This design prevents any single task from blocking the entire pipeline when requests occur simultaneously from many cameras.

ItemDescription
Optimization TargetGPU
Key ApproachDecompose scenario reasoning into smaller tasks and process them in parallel
EffectImproved scenario throughput by more than 3x, from 360 to 1,192 scenarios per hour

The key idea behind this approach is to terminate unnecessary inference early. If an early step determines that the alarm condition is not met, EVA does not proceed with additional VLM calls, image description generation, or vectorization.

As a result, GPU resources can be focused only on tasks that require actual reasoning. In real operating environments, this improved scenario throughput from 360 to 1,192 scenarios per hour, more than a 3x increase.




3. Inference Optimization Based on Detection Modes

Processing every scenario in the same way increases unnecessary GPU usage. EVA analyzes the characteristics of each user-defined scenario and automatically selects the most appropriate detection method.

Scenarios that only require simple object presence checks are handled mainly by the VM, while scenarios that require complex contextual reasoning are processed using VLM-based inference.

Detection ModeMain Role
Simple ModeDetects simple object presence such as people, vehicles, or equipment using the VM
Default ModeSeparates various scenarios into detection steps and exception conditions for multi-step reasoning
PPE ModePrecisely checks whether protective equipment is worn at the worker level
Thinking ModePerforms context-based reasoning for complex situations such as fire, falling, or risky behavior

For example, a scenario such as “Notify me when a person is visible” does not require a high-cost VLM and can be processed with the VM alone. On the other hand, a scenario such as “Notify me when a worker enters a dangerous area without wearing a safety helmet” requires additional reasoning because it must evaluate object presence, PPE status, and spatial context together.

By using only the level of model required for each scenario, EVA reduces GPU usage while maintaining detection performance.




4. Dynamic Worker Allocation by Model

EVA dynamically allocates Workers based on the models in use and the number of cameras assigned to each model.

For example, if many cameras are using OMDet at a certain point in time, EVA assigns more Workers to that model. Models with lower usage are kept with minimal resources. This prevents GPU resources from being overly concentrated on a specific model and helps maintain stable overall GPU utilization.

ItemDescription
Optimization TargetGPU, Memory
Key ApproachAdjust the number of Workers based on camera count and request volume by model
Operating MetricMaintains frame drop rate within 10% in an internal load test with 100 cameras

If Workers are allocated statically, a model may occupy resources regardless of actual usage, or a heavily used model may not have enough processing capacity.

EVA adjusts the number of Workers based on request volume by model, keeping the rate of AI inference target frames missed due to processing delay within 10%. This figure is an operating metric managed under an internal load test environment with 100 cameras and simultaneous AI inference requests.




5. Priority Queue-Based Request Scheduling

In large-scale camera environments, certain cameras may temporarily generate a large number of requests. If requests are processed in a simple FIFO order, one camera may occupy excessive system resources, delaying inference for other cameras.

EVA manages inference requests by camera using a priority queue so that all cameras can access the AI pipeline fairly.

Scheduling CriteriaProcessing Method
Cameras with fewer processed requestsProcessed first
Requests under the same conditionProcessed in arrival order
When the queue is fullOlder requests from cameras with excessive accumulated requests are removed first
Frames that are no longer timelyExcluded from subsequent inference

This structure prevents inference for other cameras from being delayed even when requests temporarily spike from a specific camera. It also prevents outdated frames from being processed too late and causing incorrect alarms, helping maintain the stability of the real-time AI pipeline.




6. Dynamic FPS Control Based on Operating State

EVA dynamically adjusts video collection and processing frequency based on each camera’s operating state.

A simple connection state, an AI inference state, and a live streaming state all require different levels of processing. If all cameras are processed at the same FPS, unnecessary CPU, GPU, and network usage increases.

ItemDescription
Optimization TargetCPU, GPU, Network
Key ApproachDynamically adjust collection, analysis, and streaming FPS based on camera operating state
EffectReduced CPU usage by approximately 35% when 50 cameras were connected in Monitoring On state

For example, in a simple connection state, EVA collects only the minimum number of frames needed to maintain the connection. When AI inference is required, only the frames needed for analysis are selected and processed. Higher-FPS streaming is performed only when a user is watching the live video.

In particular, EVA does not decode every collected frame or use every frame for AI analysis. It selectively extracts only the frames needed for inference, minimizing unnecessary video collection, decoding, preprocessing, and inference operations.

Internal validation showed that, under the same server specifications, CPU usage for 50 cameras in Monitoring On state was reduced from around 100% to around 65%, resulting in an approximately 35% reduction in CPU resource usage.




7. Multi-User Streaming Optimization

In monitoring environments, multiple users often watch the same camera stream at the same time.

In a conventional structure, as the number of users increases, decoding, preprocessing, and encoding may be repeatedly performed for the same video stream, significantly increasing CPU and memory usage. EVA reduces this overhead by sharing video processing results for the same camera.

ItemDescription
Optimization TargetCPU, Memory, Network
Key ApproachDecode, preprocess, and encode the same camera stream once, then share it with multiple users
EffectPrevents CPU and memory usage from increasing linearly with the number of users

EVA performs decoding, preprocessing, and encoding only once for the same camera, then shares the generated stream with multiple users. As a result, even when the number of users increases, the additional cost is mainly limited to network transmission.

This allows EVA to maintain stable streaming quality while minimizing server resource usage, even when many users monitor video streams simultaneously.




Closing

EVA’s infrastructure efficiency does not depend solely on single-GPU performance. It is the result of optimizing the entire pipeline, including video ingestion, frame selection, AI inference, request scheduling, and streaming delivery.

EVA combines the following technologies to reliably support more cameras on the same server environment.

Optimization TechnologyMain Effect
VM·VLM SeparationMinimizes high-cost VLM calls
Task Decomposition and Parallel ProcessingImproves scenario throughput by more than 3x
Detection Mode-Based OptimizationUses models according to scenario characteristics
Dynamic Worker AllocationOptimizes GPU and memory usage based on request volume by model
Priority QueueProvides fair inference opportunities across cameras
Dynamic FPS ControlReduces CPU, GPU, and network usage
Multi-User Streaming OptimizationEliminates redundant decoding and encoding

Ultimately, EVA’s large-scale camera capacity is not achieved by a single optimization technique. It is the result of designing the AI model structure and system infrastructure together to use limited server resources more efficiently.

EVA will continue to enhance its infrastructure optimization technologies to reliably support more cameras, more complex scenarios, and more diverse operating environments.