Optimizing Detection Operations with Meta Agent
New in EVA v3.0: Meta Agent for Easier and More Accurate Detection Operations
To operate EVA reliably for a specific purpose, key settings such as object targets, detection sensitivity, and vision models must be continuously tuned to each scenario and camera environment. The core challenge is that it is very difficult for humans to consistently decide "what value should be set now" by combining historical data with the current scene.
To reduce this operational burden and make desired detections easier and more accurate, EVA plans to apply Meta Agent. Meta intelligence will continue to evolve, and in v3.0, object-sensitivity recommendation and vision-model recommendation are provided first.
1. Why Meta Agent Is Needed More Than Ever
In small environments, manual camera-by-camera tuning may appear manageable. But once the number of cameras exceeds 100, the operational reality changes completely.
Each camera has different installation height, field of view, lighting, background reflection, and workflow pattern. Even with the same scenario, false-positive/false-negative patterns vary by camera. At that point, per-camera optimization becomes necessary, and that work requires both significant effort and specialized operational know-how.
Typical pain points in real operations:
- If object definitions are too broad or ambiguous, false positives increase and unnecessary inference grows.
- If sensitivity is too low, non-target objects are detected; if too high, required targets are missed.
- Model performance varies by scenario (for example, fire/smoke/fall), camera angle, lighting, and background structure.
- Even after changing settings, effects are hard to validate immediately, and manual tuning cost rises sharply with camera count.
In short, one-time initial setup cannot sustain long-term detection quality. A data-driven auto-optimization layer is required.
2. How Meta Agent Operates
Meta Agent periodically checks each camera state and operates based on the following signals:
- Recent detected-alert data
- Accumulated user feedback
- Accumulated image volume available for recommendation decisions
The key point is this: the more feedback a camera accumulates, the more frequently Meta Agent can analyze that camera, and the more accurate its recommendations become.
So Meta Agent is not a one-time analyzer. It continuously adjusts recommendation frequency and quality by reflecting both camera-level operating status and feedback accumulation.
This process does not end at recommendation alone. By continuously analyzing real detection data from live environments, it also improves EVA's operating strategy and recommendation precision over time.
As camera count grows and more data accumulates, EVA evolves toward better field fitness.
3. Object-Sensitivity Recommendation Logic
Meta Agent recommendation logic follows this sequence:
- Run inference on the same object with five vision models.
- Perform automatic labeling by ensembling model outputs.
- Verify whether true/false detections can be separated on 20+ images.
- Generate sensitivity/model recommendations only when separability is confirmed.
Why ensemble instead of a single model:
- Objects falsely detected by a vision model are likely to propagate as false positives into VLM-stage decisions.
- If early visual recognition is unstable, higher-level reasoning can also be distorted.
- Therefore, establishing object validity through cross-model agreement is more stable than trusting one model output.12
There is also a clear efficiency benefit:
- VLM inference consumes substantial resources, so large-scale labeling by repeatedly calling VLM is inefficient.
- In contrast, multi-vision-model ensembling can validate large samples faster at lower cost, making it more practical for operations.
Internal validation showed that multi-model ensemble labeling achieved about 30% higher labeling accuracy than single-judgment labeling.
Because all vision models already infer on the same images during this process,
- appropriate sensitivity can be recommended immediately for the current model, and
- if all scenarios detect the same target, model-change recommendation can also be connected.
Recommendation message example:
False positives are occurring for detection target {object}. Try switching to model {A} and adjusting sensitivity to {0.xx}.
ℹ️ Model-change recommendation is provided only when all scenarios share the same detection target.
Closing
Meta Agent is EVA's operational-optimization layer that shifts sensitivity/model tuning from intuition-driven manual work to data-driven recommendation.
Starting with object-sensitivity and vision-model recommendation in v3.0, Meta intelligence will continue to evolve to improve scenario-level operational quality over time.
Footnotes
-
Wang et al., "Evaluating Object Hallucination in Image Captioning" (EMNLP 2018), https://arxiv.org/abs/1809.02156 ↩
-
Leng et al., "Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding" (CVPR 2024), https://arxiv.org/abs/2311.16922 ↩



