Infrastructure Optimization for Supporting Large-Scale Camera Environments in EVA
EVA has evolved into an architecture that efficiently utilizes overall server resources, rather than relying solely on GPU performance, to provide AI services for more than 100 cameras on a single server.
In large-scale camera environments, simply using a higher-performance GPU is not enough. The system must maintain stable streaming for all cameras while processing AI inference requests from multiple cameras within limited CPU, memory, network, and GPU resources.
If the AI pipeline becomes biased toward a specific camera or model, some cameras may not be analyzed properly, or the delay between event occurrence and alarm delivery may increase. For this reason, EVA optimizes its infrastructure across the entire pipeline, from video ingestion and AI inference to streaming delivery.
In this article, we introduce the key infrastructure optimization technologies EVA applies to reliably support large-scale camera environments.
1. VM·VLM Separated Architecture
EVA does not analyze every video frame with a high-cost VLM. Instead, a VM(Vision Model) first checks whether the target object exists and whether basic conditions are met. Only when further reasoning is required does EVA run the VLM(Vision Language Model).
For example, in a PPE non-compliance detection scenario, frames without a person or frames unrelated to the target condition are filtered out at an early stage. VLM inference is performed only when a person is detected and the system needs to determine whether protective equipment is being worn.
| Item | Description |
|---|---|
| Optimization Target | GPU |
| Key Approach | Use VM-based filtering first and run VLM only when needed |
| Effect | Improved camera capacity on the same server from around 20 cameras to up to 100 cameras |
In the initial architecture, GPU computation increased rapidly because every frame was processed mainly through the VLM. In the current architecture, the VM first filters the targets that require further analysis, significantly reducing the number of VLM calls.
With this structure, EVA improved the number of cameras that can be supported on the same server from around 20 to up to 100. This figure is based on internal validation comparing the initial VLM-centric structure with the current architecture, where VM filtering is applied before invoking the VLM only when necessary.
2. Task Decomposition and GPU Parallel Processing
EVA does not process inference requests from multiple cameras as one large task. Instead, it decomposes the scenario reasoning process into smaller tasks such as detection step evaluation, exception checking, image description generation, and vectorization.
These decomposed tasks are distributed and processed in parallel across multiple inference instances. This design prevents any single task from blocking the entire pipeline when requests occur simultaneously from many cameras.
| Item | Description |
|---|---|
| Optimization Target | GPU |
| Key Approach | Decompose scenario reasoning into smaller tasks and process them in parallel |
| Effect | Improved scenario throughput by more than 3x, from 360 to 1,192 scenarios per hour |
The key idea behind this approach is to terminate unnecessary inference early. If an early step determines that the alarm condition is not met, EVA does not proceed with additional VLM calls, image description generation, or vectorization.
As a result, GPU resources can be focused only on tasks that require actual reasoning. In real operating environments, this improved scenario throughput from 360 to 1,192 scenarios per hour, more than a 3x increase.
3. Inference Optimization Based on Detection Modes
Processing every scenario in the same way increases unnecessary GPU usage. EVA analyzes the characteristics of each user-defined scenario and automatically selects the most appropriate detection method.
Scenarios that only require simple object presence checks are handled mainly by the VM, while scenarios that require complex contextual reasoning are processed using VLM-based inference.
| Detection Mode | Main Role |
|---|---|
| Simple Mode | Detects simple object presence such as people, vehicles, or equipment using the VM |
| Default Mode | Separates various scenarios into detection steps and exception conditions for multi-step reasoning |
| PPE Mode | Precisely checks whether protective equipment is worn at the worker level |
| Thinking Mode | Performs context-based reasoning for complex situations such as fire, falling, or risky behavior |
For example, a scenario such as “Notify me when a person is visible” does not require a high-cost VLM and can be processed with the VM alone. On the other hand, a scenario such as “Notify me when a worker enters a dangerous area without wearing a safety helmet” requires additional reasoning because it must evaluate object presence, PPE status, and spatial context together.
By using only the level of model required for each scenario, EVA reduces GPU usage while maintaining detection performance.
4. Dynamic Worker Allocation by Model
EVA dynamically allocates Workers based on the models in use and the number of cameras assigned to each model.
For example, if many cameras are using OMDet at a certain point in time, EVA assigns more Workers to that model. Models with lower usage are kept with minimal resources. This prevents GPU resources from being overly concentrated on a specific model and helps maintain stable overall GPU utilization.
| Item | Description |
|---|---|
| Optimization Target | GPU, Memory |
| Key Approach | Adjust the number of Workers based on camera count and request volume by model |
| Operating Metric | Maintains frame drop rate within 10% in an internal load test with 100 cameras |
If Workers are allocated statically, a model may occupy resources regardless of actual usage, or a heavily used model may not have enough processing capacity.
EVA adjusts the number of Workers based on request volume by model, keeping the rate of AI inference target frames missed due to processing delay within 10%. This figure is an operating metric managed under an internal load test environment with 100 cameras and simultaneous AI inference requests.
5. Priority Queue-Based Request Scheduling
In large-scale camera environments, certain cameras may temporarily generate a large number of requests. If requests are processed in a simple FIFO order, one camera may occupy excessive system resources, delaying inference for other cameras.
EVA manages inference requests by camera using a priority queue so that all cameras can access the AI pipeline fairly.
| Scheduling Criteria | Processing Method |
|---|---|
| Cameras with fewer processed requests | Processed first |
| Requests under the same condition | Processed in arrival order |
| When the queue is full | Older requests from cameras with excessive accumulated requests are removed first |
| Frames that are no longer timely | Excluded from subsequent inference |
This structure prevents inference for other cameras from being delayed even when requests temporarily spike from a specific camera. It also prevents outdated frames from being processed too late and causing incorrect alarms, helping maintain the stability of the real-time AI pipeline.
6. Dynamic FPS Control Based on Operating State
EVA dynamically adjusts video collection and processing frequency based on each camera’s operating state.
A simple connection state, an AI inference state, and a live streaming state all require different levels of processing. If all cameras are processed at the same FPS, unnecessary CPU, GPU, and network usage increases.
| Item | Description |
|---|---|
| Optimization Target | CPU, GPU, Network |
| Key Approach | Dynamically adjust collection, analysis, and streaming FPS based on camera operating state |
| Effect | Reduced CPU usage by approximately 35% when 50 cameras were connected in Monitoring On state |
For example, in a simple connection state, EVA collects only the minimum number of frames needed to maintain the connection. When AI inference is required, only the frames needed for analysis are selected and processed. Higher-FPS streaming is performed only when a user is watching the live video.
In particular, EVA does not decode every collected frame or use every frame for AI analysis. It selectively extracts only the frames needed for inference, minimizing unnecessary video collection, decoding, preprocessing, and inference operations.
Internal validation showed that, under the same server specifications, CPU usage for 50 cameras in Monitoring On state was reduced from around 100% to around 65%, resulting in an approximately 35% reduction in CPU resource usage.
7. Multi-User Streaming Optimization
In monitoring environments, multiple users often watch the same camera stream at the same time.
In a conventional structure, as the number of users increases, decoding, preprocessing, and encoding may be repeatedly performed for the same video stream, significantly increasing CPU and memory usage. EVA reduces this overhead by sharing video processing results for the same camera.
| Item | Description |
|---|---|
| Optimization Target | CPU, Memory, Network |
| Key Approach | Decode, preprocess, and encode the same camera stream once, then share it with multiple users |
| Effect | Prevents CPU and memory usage from increasing linearly with the number of users |
EVA performs decoding, preprocessing, and encoding only once for the same camera, then shares the generated stream with multiple users. As a result, even when the number of users increases, the additional cost is mainly limited to network transmission.
This allows EVA to maintain stable streaming quality while minimizing server resource usage, even when many users monitor video streams simultaneously.
Closing
EVA’s infrastructure efficiency does not depend solely on single-GPU performance. It is the result of optimizing the entire pipeline, including video ingestion, frame selection, AI inference, request scheduling, and streaming delivery.
EVA combines the following technologies to reliably support more cameras on the same server environment.
| Optimization Technology | Main Effect |
|---|---|
| VM·VLM Separation | Minimizes high-cost VLM calls |
| Task Decomposition and Parallel Processing | Improves scenario throughput by more than 3x |
| Detection Mode-Based Optimization | Uses models according to scenario characteristics |
| Dynamic Worker Allocation | Optimizes GPU and memory usage based on request volume by model |
| Priority Queue | Provides fair inference opportunities across cameras |
| Dynamic FPS Control | Reduces CPU, GPU, and network usage |
| Multi-User Streaming Optimization | Eliminates redundant decoding and encoding |
Ultimately, EVA’s large-scale camera capacity is not achieved by a single optimization technique. It is the result of designing the AI model structure and system infrastructure together to use limited server resources more efficiently.
EVA will continue to enhance its infrastructure optimization technologies to reliably support more cameras, more complex scenarios, and more diverse operating environments.



