From One-Shot Decisions to Two-Stage Reasoning

December 2, 2025 · 7 min read

Seongwoo Kong

AI Specialist

Jisu Kang

AI Specialist

Keewon Jeong

Solution Architect

Instead of Making a Single Decision, Be Cautious Step-by-Step

The process of AI making a decision from a single camera image is more complex than most people think. Users may simply ask: “Notify me if someone falls down,” “Alert me when a worker isn’t wearing a mask,” But the AI has to: analyze the image, check the requested conditions, consider exceptions, make the final decision, and explain the reasoning — all in a single pass.

In EVA, we introduced an Enriched Input structure that separates the user’s requirements into Detection conditions and Exception conditions, which significantly improved performance. However, even with structured input, the AI still made contradictory judgments in multi-condition scenarios.

The issue was not only about structuring the conditions — but also about forcing the AI to perform multiple judgments all at once. So EVA moved beyond the limitations of the existing one-shot approach and introduced a new Two-Stage Reasoning process.

In this post, we cover:

Why structured input alone could not solve the problem
The fundamental limits of one-shot decision-making
Why AI works better when decisions are split into two stages
Performance improvements validated by real experiments

1. Problems That Structured Input Alone Cannot Solve

By converting natural language inputs into structured Enriched Input, EVA achieved noticeable performance improvements for simple tasks such as “mask detection” or “fall detection.” However, confusing Vision Language Model (VLM) behavior still surfaced.

Example Case

Detection Steps

Person present in the image → True
At least one person appears to have fallen → True

Exceptions

Hard to confirm body shape of fallen person (occlusion, etc.) → True
Unable to confirm human form clearly (only silhouette or less than 50% visible, etc.) → False
The fallen person seems not in danger (lying on a desk, using phone, etc.) → True
Hard to make accurate judgment due to low image quality → False

AI Decision Detection result: False Evidence: A person appears to be lying down, but it does not seem dangerous.

Despite clearly provided detection and exception logic, the detection result, exception result, and evidence contradict one another — suggesting that forcing multiple decisions in a single inference pass may be fundamentally flawed.

2. Fundamental Limits of One-Shot Judgment

In EVA, the AI simultaneously performs:

Image analysis
Checking detection condition
Checking exception condition
Making the final decision
Generating explanation

The errors occurring at the moment of final decision fall into three categories:

A conclusion derived only from detection conditions
A conclusion derived only from exception conditions
Failure to properly integrate both → contradictory results

As the number of conditions grows, VLMs struggle to deliver consistent results in detection, exception evaluation, conclusion, and explanation all at once. This is more pronounced in lightweight models, where ability is significantly reduced.

3. Introducing Two-Stage Reasoning

Google Brain’s research "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2022)" demonstrated that LLMs solve complex problems better when the reasoning process is broken into steps instead of requiring an answer immediately. The key idea — “complex reasoning improves when broken down into smaller steps” — fits perfectly for VLM-based judgment.

EVA’s Two-Stage Reasoning

“Don’t ask the model to do everything at once. Ask step-by-step.”

Stage 1 – Detection: “Is this a suspicious situation?”
- Goal: Don’t miss anything
- Exceptions not considered
Stage 2 – Exception Checking: “Could this be normal?”
- Goal: Reduce false alarms
- Focus only on special cases where mistakes are likely
- If any exception is True → no alert

Image → [Detection VLM] → (Fail) → Normal End
                        └→ (Pass) → [Exception VLM] → (Exception True) → No Alert
                                                    └→ (Exception False) → Trigger Alert

This is essentially applying Chain-of-Thought reasoning as structured rules tailored to a monitoring solution.

Humans do the same — making multiple decisions at once easily leads to mistakes, but judgments become much more accurate when done step-by-step.

With this structure, the VLM focuses on one mission per stage, improving consistency.

4. Detection Step & Exceptions

Below is the structure reflecting Two-Stage Reasoning.

4.1 Detection Stage — Capture Suspicious Cases Step-by-Step

Stage 1 has a clear mission:

“Do not miss anything.”

In other words, cast a wide net to catch suspicious situations.

Principles:

Evaluate conditions sequentially
Do not consider exceptions
False positives are acceptable (They will be filtered later)

Example: Mask Non-Wearing Detection

Is there any person in the image?
Does at least one person appear to not be wearing a mask?

This stage is intentionally broad.

4.2 Exception Stage — Filter Out Normal Situations Precisely

The goal here is opposite:

“Reduce false positives.”

Even if suspicious, the situation may still be normal — this stage filters such cases.

A key rule introduced in EVA:

“Exceptions are defined only at the overall level.”

Not per-person or partly ambiguous — check whether the whole situation is normal.

Examples:

Is every person wearing a mask?
Is judgment difficult due to occlusion or poor quality?

If either is True → Normal (No alert) If both are False → Real issue (Alert)

5. Advantages of Two-Stage Reasoning

Two-Stage reasoning is not just “adding another step.”

5-1. Role Separation → Less Confusion

Stage 1 focuses on recall
Stage 2 focuses on precision

The model no longer needs to satisfy conflicting objectives at once.

5-2. Clearer Judgment Boundaries

Stage 1 → “Suspicious?”
Stage 2 → “Normal?”

Clarity in criteria leads to clarity in reasoning.

5-3. Preventing Confirmation Bias

Initially, EVA passed the Stage 1 outcome into Stage 2 to help detection. But the result was the opposite: Stage 2 blindly followed Stage 1 decisions.

Now, both stages make decisions independently, using only the image — improving results significantly.

6. Performance Evaluation

Here are results applying Two-Stage Reasoning per scenario:

6.1 Fall Detection

Method	accuracy	precision	recall
One-Shot Decision	0.99	0.87	0.97
Two-Stage Reasoning	0.99	0.99	0.91

Precision improved dramatically 0.87 → 0.99, reducing false positives significantly.

6.2 Mask Non-Wearing Detection

Method	accuracy	precision	recall
One-Shot Decision	0.59	0.73	0.52
Two-Stage Reasoning	0.61	0.70	0.63

Though a difficult task requiring full mask verification, detection rates improved thanks to Stage 1.

6.3 Others (Arson, Bus, etc.)

Method	accuracy	precision	recall
One-Shot Decision	0.60	0.63	0.53
Two-Stage Reasoning	0.61	0.70	0.63

Even in challenging cases:

Visual illusions caused by shadows
Low image quality
Complex environments

False positives dropped significantly and detection performance improved.

Two-Stage Reasoning enables AI to make more accurate, logical, and trustworthy judgments.

Conclusion

Through Two-Stage Reasoning, we learned:

What matters is not which model you use, but how clearly you instruct it.

When asked to do everything at once, AI becomes confused and contradictory. But when tasks are divided with a clear role for each stage, AI becomes far more stable and consistent.

This structural improvement plays a crucial role in making EVA a more reliable solution in the field.

References

J. Wei, X. Wang, D. Schuurmans, M. Bosma, E. H. Chi, Q. Le, and D. Zhou. Chain of thought prompting elicits reasoning in large language models. CoRR, abs/2201.11903, 2022 (Paper)
Language Models Perform Reasoning via Chain of Thought (Google Research)
Chain-of-thought reasoning supercharges enterprise LLMs (K2View Blog)
Marcetic, Darijan & Hrkać, Tomislav & Ribaric, S.. (2016). Two-stage cascade model for unconstrained face detection. 1-4. 10.1109/SPLIM.2016.7528404 (ResearchGate)
Survey/Research on LLM hallucination and step-by-step mitigation pipelines (MDPI)

Instead of Making a Single Decision, Be Cautious Step-by-Step​

1. Problems That Structured Input Alone Cannot Solve​

Example Case​

2. Fundamental Limits of One-Shot Judgment​

3. Introducing Two-Stage Reasoning​

EVA’s Two-Stage Reasoning​

4. Detection Step & Exceptions​

4.1 Detection Stage — Capture Suspicious Cases Step-by-Step​

4.2 Exception Stage — Filter Out Normal Situations Precisely​

5. Advantages of Two-Stage Reasoning​

5-1. Role Separation → Less Confusion​

5-2. Clearer Judgment Boundaries​

5-3. Preventing Confirmation Bias​

6. Performance Evaluation​

6.1 Fall Detection​

6.2 Mask Non-Wearing Detection​

6.3 Others (Arson, Bus, etc.)​

Conclusion​

References​