This work is a collaborative research effort with Minjoon Son (advised by Prof. Youngmyung Ko) as part of the "Campus: Beyond Safety to Intelligence – Postech Living Lab Project with EVA"
🎯 Introduction: Shifting Feedback from 'Retrospective Correction' to 'Cognitive Enhancement'
When EVA makes judgments based on images, operators often provide specific feedback like: "This is indeed a safety vest. Why did it get confused?" or "Shouldn't there be an alert here?" This feedback contains not just the right or wrong answer, but also the human reasoning and context behind the judgment.
Previously, EVA utilized this feedback by storing it in a separate Vector DB and using it to adjust the Alert status when similar situations occurred. While this approach offered the advantage of quick application, it had a structural limitation: it did not improve the model's intrinsic reasoning capability and merely retrospectively filtered errors.
To fundamentally address this issue, we completely changed our approach. We reconstructed user feedback not as simple error reports, but as Instruction Data that the model can directly use in its inference process to strengthen its Visual Reasoning capability.
This article will focus on how VLM-based Instruction Tuning utilizing user feedback data overcomes the limitations of the previous Vector DB-centric approach and improves the model's visual reasoning performance.