Thinking Mode for More Accurate Detection of Immediately Discernible Hazards

May 28, 2026 · 6 min read

Hyunchan Moon

AI Specialist

Seongwoo Kong

AI Specialist

Gyulim Gu

Tech Leader

New in EVA v3.0: Thinking Mode for More Accurate Detection of Immediately Discernible Hazards

In EVA v3.0, we introduced Thinking Mode to reduce false positives in safety incident detection for hazards that can be identified immediately from a single scene. The key idea is that the VLM does not jump to conclusions. Instead, it performs internal reasoning first, checks false-positive risks and logical inconsistencies from multiple angles, and then makes the final hazard decision.

Thinking Mode is especially suitable for scenarios where the hazard is immediately discernible on screen, but false-positive risk still needs multi-angle verification.

In addition, VLMs can show a tendency to align with user queries depending on context, and continuously adding exception rules in a base mode can increase latency and operational complexity.¹²

As a result, EVA v3.0 uses the model's internal Thinking process to quickly interpret scenes, evaluate false-positive possibility with common-sense reasoning first, and then issue alerts, improving overall detection reliability.

1. Why Thinking Mode Is Needed, and How It Improves Reliability

In a simple query-driven baseline, even visually obvious cases such as "is there a fire?" or "did a person fall?" can still generate false positives due to camera angle, reflections, occlusion, or low-light conditions.

On top of that, a rule-patching approach that keeps adding exception rules whenever false positives are found has clear operational limits:

Increasing scenario-management complexity as exception rules accumulate
Potentially longer inference paths and higher latency
Reduced operational consistency due to user-by-user rule variations

To address this, Thinking Mode is designed to prioritize "review before decision" rather than "quick assertion."

Core Principles

Focus on immediately discernible scenes: Prioritize incidents that can be judged clearly within a single frame.
Multi-angle false-positive review: Use internal reasoning to check counter-evidence and logical contradictions first.
Final alerts must match incident items: Trigger alerts only when detected evidence aligns with user-configured incident items.

Decision Guidelines

Thinking Mode explicitly defines the following operating policies at the system-prompt level so the model does not drift into overly deep reasoning.

Direct visual evidence first: Use only directly visible evidence from the main scene.
Judge only active incidents: Evaluate only active incident items from user input, preserving order.
Return False when ambiguous: If evidence is weak, noisy, occluded, or ambiguous, return False.
Incident-specific precision guards:
- Fall/down: Crouching, kneeling, sitting, or perspective ambiguity alone must not trigger True.
- Smoke/spark: Steam, dust, reflection, blur, or lighting artifacts alone must not trigger True.
- Fire: Bright light or reflection without direct flame/combustion evidence must not trigger True.
Strict output normalization: Return per-incident outputs only in the required JSON schema for stable post-processing.

In short, Thinking Mode is not just a detector. It is an operational decision mode that combines evidence-centered judgment with false-positive suppression guards.

2. Performance Review

Thinking Mode is evaluated not only by final right/wrong labels, but by how the agent produces intermediate evidence and matches that evidence against scenario rules. As shown below, the model first reasons over scene cues, then determines whether to alert by matching user-requested incidents.

Source: National Information Society Agency (NIA), Korea

This case corresponds to fall/down detection. In Thinking Mode, the model reaches alert: true after checking the following core reasoning points.

Scene analysis: Detect a worker in PPE (white protective suit and hard hat) lying horizontally on the industrial floor.
Posture judgment: Determine that the posture is not crouching/sitting/working posture, but loss of upright posture with body-on-ground evidence.
Fall guard validation: Verify that the condition of direct visual evidence for collapse/body-on-ground is satisfied.
Ambiguity check: Confirm low likelihood of benign alternatives (for example stretching or temporary posture change).
Final decision: Sufficient direct evidence of a fall/down event, therefore alert: true.

In contrast, this case appears like a vehicle fire at first glance, but is actually not a fire. Thinking Mode still avoids single-cue decisions and reviews the scene from multiple angles.

Key checkpoints from intermediate reasoning:

Scene-context check: Confirm nighttime parking-lot context with multiple strong light sources (streetlights/headlights).
Re-interpret suspicious cue: Treat rear red glow as a possible reflection/light artifact instead of immediately labeling it as flame.
Apply fire guards: Re-check direct flame shape, combustion signals, and smoke evidence.
Final decision: No clear flame/combustion evidence, therefore alert: false.

This shows that even for visually confusing cues such as strong red illumination, Thinking Mode reviews counter-evidence before deciding, reducing false positives and delivering more reliable alerts.

As a result, the following scenario-level metrics were measured for Thinking Mode:

Fall/Down

Mode	Accuracy	Precision	Recall	F1 Score
Thinking Mode	0.8967	0.5814	0.8621	0.6944
Base Mode	0.6854	0.2841	0.8621	0.4274

Smoke/Spark

Mode	Accuracy	Precision	Recall	F1 Score
Thinking Mode	0.9041	1.0000	0.8158	0.8986
Base Mode	0.7671	0.9565	0.5789	0.7213

Fire

Mode	Accuracy	Precision	Recall	F1 Score
Thinking Mode	0.9755	0.7627	0.9000	0.8257
Base Mode	0.9328	0.4762	0.4000	0.4348

In summary, Thinking Mode significantly improves performance for immediately discernible fall/down, smoke/spark, and fire scenarios.

⚠️ Thinking Mode can also perform well in other scenario types, but for cases requiring detailed human-action/gear-state inspection or strict environment-specific rule reasoning, inference time can become very long. We recommend using it selectively for suitable scenarios.

3. Usability: Expand Coverage by Editing Incident Items, Without Adding New Scenarios

In EVA v3.0, Thinking Mode can expand immediately discernible incident detection by editing incident items, without creating a new scenario each time.

## 💡 Detection with Thinking Mode
When an object is detected, the agent reviews false-positive possibility and logical inconsistencies at the configured detection interval,
then decides whether a hazardous incident is present. If hazardous, an alert is triggered.

### Hazard Incidents (up to 3)
Complex or ambiguous incidents may require longer reasoning time.
Please define incidents that can be clearly judged within a single scene.

- Fire outbreak

**Good examples**
- Fire outbreak
- Smoke outbreak
- Person collapsed on the floor

**Not recommended**
- Hazard detection based on specific human actions
- Scenarios requiring composite object-state reasoning

Closing

The core of Thinking Mode is "review before decision" rather than "immediate reaction." This helps lower false positives in immediately discernible incident scenarios and provides alert quality that operators can trust in real deployments.

Wang et al., "Evaluating Object Hallucination in Large Vision-Language Models" (EMNLP 2023), https://arxiv.org/abs/2305.10355 ↩
Anthropic, "Towards Understanding Sycophancy in Language Models" (2023), https://www.anthropic.com/research/towards-understanding-sycophancy-in-language-models ↩

Thinking Mode for More Accurate Detection of Immediately Discernible Hazards

New in EVA v3.0: Thinking Mode for More Accurate Detection of Immediately Discernible Hazards