Launching EVA: The World’s First Commercial VLM Service on Rebellions NPU
In close collaboration with Rebellions, the EVA team has continuously advanced the technology stack and successfully built a production-grade NPU-based runtime environment for EVA. We are now moving beyond technical validation and officially entering the phase of commercial service deployment.
1. ATOM-MAX NPU Performance Validation: A New Standard for VLM Inference (As-Is)
EVA recently evaluated operational feasibility with the Qwen3 VL 8B model on Rebellions' latest ATOM-MAX NPU environment. This was not just a benchmark for model accuracy, but a validation of key operational requirements for real industrial services.
- Rebellions ATOM / Qwen3 VL 8B / Accuracy 0.7996 / F1 0.6733
- GPU A100 / Qwen3 VL 8B FP8 / Accuracy 0.7779 / F1 0.5979
Compared with GPU (A100), EVA achieved equivalent or better performance on overall inference metrics. In particular, in fire and smoke detection scenarios, the NPU environment demonstrated stronger processing capability, proving applicability to high-complexity industrial safety monitoring.
2. Optimization and Stability for Commercial Operations (As-Is)
In real-world deployments, AI systems must handle far more than clean benchmark inputs. Mixed text-image requests and multiple simultaneous camera streams can easily create bottlenecks. To address this, EVA has continuously improved optimization at both the NPU compiler and system levels.
It is critical to build a resource orchestration framework that efficiently distributes CPU, memory, and NPU workloads, so multiple AI Agents can run concurrently without performance degradation. It is equally important to resolve unexpected failures and ensure stable, uninterrupted operation when text-only and image-analysis requests arrive at the same time.
- Complex data processing stabilization: We fully resolved potential malfunctions in multi-core environments where Text Only and Text + Image requests are mixed, significantly improving operational reliability.
- Resource efficiency optimization: By precisely controlling data processing policies across CPU, memory, and NPU, we achieved a high-efficiency runtime where multiple VLM instances can run simultaneously without inference speed degradation.
3. Throughput Optimization Based on Parallel Architecture (To-Be)
EVA is also pushing forward full-stack parallelization to maximize the multi-core architecture of Rebellions NPU and further advance end-to-end technology integration.
- Parallelization strategy: We are developing techniques to remove VLM inference bottlenecks by applying data parallelism (DP) to the Vision Encoder and tensor parallelism (TP) to the Text Decoder.
- Integrated operations strategy: We are defining the optimal number of concurrent instances and core allocation ratios across multiple NPU resources. This enables GPU-level throughput while significantly improving performance-per-watt and reducing TCO (Total Cost of Ownership).
Closing: The Commercial Era of Efficient Industrial AI
The combination of EVA and Rebellions NPU is not a simple hardware replacement. It represents a full-stack transformation toward always-on AI inference in the field with predictable operations and a strong balance of high performance, high efficiency, and high stability. Based on validated NPU optimization technologies, EVA will accelerate digital transformation in industrial environments with a more cost-efficient operating model.
Related Reading
Launching EVA: The World’s First Commercial VLM Service on Rebellions NPU









