Add server-side ensemble prediction: Add /predictions/ensemble support with model-specific thresholds, vote-based detection aggregation, and current-model result reporting.
Add ROI-only detection behavior: Filter detection responses to return only boxes inside configured interest areas so downstream VLM flows receive area-relevant detections only.
Improve EVA App synchronization flow: Make POST /sync the source of truth for camera counts, health snapshots, and dynamic worker scaling.
Refactor proxy into modular API, schema, and service layers: Split the proxy into route modules, Pydantic schemas, TorchServe services, health services, camera registry, and scaling policy modules while preserving compatibility wrappers for older imports.
Add background health checker and probes: Track latest health state from FastAPI, TorchServe inference/management endpoints, model worker readiness, optional inference probes, accelerator detection, and system resource thresholds.
Add dynamic worker scaling policy: Introduce camera-count based worker scaling with model resource profiles, min/max worker constraints, and optional GPU memory validation.
Add CUDA MPS deployment support: Add GPU_MPS_ENABLED, MPS pipe/log directory handling, daemon startup/shutdown in run.sh, and Helm values/volume mounts for MPS environments.
Add MIG GPU deployment support: Add Helm configuration for MIG resource names, GPU counts, and CUDA visible devices.
Optimize MMGDino and LLMDet inference: Add fixed image preprocessing buckets, GPU-side image preparation, cached text features, and compiled model submodules for faster repeated zero-shot inference.
Fix ensemble fallback to unavailable models: Skip models that are not registered or do not have READY TorchServe workers instead of falling back to unavailable ensemble candidates.
Fix non-serving model handling in ensemble prediction: Check TorchServe management state before dispatching ensemble requests so unavailable models do not break prediction flows.
Fix MIG metric permission failures: Treat unavailable MIG memory/utilization metrics as missing optional data instead of raising VE-4xx style resource failures.
Fix MIG UUID handling: Use MIG UUIDs whenever visible so GPU discovery works correctly in MIG-partitioned deployments.
Fix missing MIG utilization metrics: Allow utilization fields to be absent without failing resource schemas or health responses.
Fix minimum workers during sync: Preserve configured minimum worker counts when EVA App sync updates camera counts.
Fix SYNC API request field name: Change sync request format from id to eva_id.
Fix RT-DETRV2 person target labeling: Correct RT-DETRV2 person target handling so person-only detection is labeled consistently.
Fix long-running sync/scaling timeout: Increase timeout handling from 90 seconds to 3600 seconds for longer worker operations.
Fix Helm boolean handling: Ensure explicit false values are rendered correctly instead of being treated the same as empty values.
Add configurable TorchServe log profiles: Add separate dev and prod Log4j2 profiles so development runs can keep full model and metrics visibility while production runs reduce model log volume to warning/error records.
Improve model log observability: Add Python logger and model-name metadata to TorchServe MODEL_LOG output so administrators can identify which handler produced each model.log entry.
Preserve access and metrics visibility for latency analysis: Keep ACCESS_LOG, TS_METRICS, and MODEL_METRICS available in both log profiles so HTTP latency can be compared with TorchServe HandlerTime, queue, and system metrics.
Add log profile support to service startup: Extend run.sh with --log-profile dev|prod, default development logging, production Docker startup arguments, per-log retention cleanup, and safer shell execution behavior.
Tune proxy logging by profile: Allow proxy INFO log sampling to be controlled at startup, with full INFO logging in development and sampled INFO logging in production.
Use the fast RT-DETR V2 processor explicitly: Load RTDetrImageProcessorFast with use_fast=True to avoid processor fallback ambiguity.
Fix handler log source attribution in TorchServe: Reformat Python worker log records before TorchServe captures them so model.log no longer collapses all handler output into an untraceable MODEL_LOG stream.
Fix IG processor device placement: Pass device=self.device through IG processor calls so generated tensors align with the handler device during embedding and inference flows.
Fix repeated TorchServe pynvml warning noise: Suppress the TorchServe system metrics pynvml deprecation FutureWarning so it does not repeatedly pollute errors.log.
Fix archive/runtime packaging for logging utilities: Include the shared logging utility in model archive extra files from eva_ts/utils while keeping the handlers directory limited to handler files.
Optimize IG inference runtime: Refactor IG embedding and inference paths into reusable core modules, apply torch.compile(dynamic=True) with eager fallback, and enable CUDA bfloat16 execution for improved serving efficiency.
Strengthen IG API validation and documentation: Add proxy-side validation for IG inference payloads and document the full get_embedding / inference request-response formats, including scenario thresholds and interest zones.
Tune default serving worker settings: Reduce default OmDet and MMGDino worker counts from 2 to 1 for a lighter default runtime profile.
Fix IG runtime packaging and utility resolution: Include the new IG utility module in the model archive flow and improve interest-zone utility imports so IG runtime dependencies resolve correctly in TorchServe.
Fix IG box conversion compatibility: Replace torchvision.ops.box_convert in IG hot paths with a Torch-native conversion helper to avoid compile and runtime compatibility issues.
Align Helm-based TorchServe configuration flow: Mount config.properties.template from the Helm ConfigMap to /app/eva_ts/config.properties.template so runtime configuration can be generated from the deployed template.
Apply Helm model and worker settings at runtime: Ensure TorchServe model settings such as loaded models, worker counts, batch size, and batch delay configured in Helm values are reflected in the generated config.properties.
Expose additional TorchServe runtime controls: Add number_of_netty_threads and worker_retry_timeout_sec to the Helm-rendered TorchServe configuration template.
Refine Kubernetes service defaults: Change the default service type from NodePort to ClusterIP and render nodePort only when the service type is NodePort.
Tune default serving resources and model parameters: Increase default memory requests and limits to 32Gi, reduce selected model worker counts to 1, and adjust RT-DETRV2 batch settings for a more stable runtime profile.
Fix Helm values not being applied to TorchServe runtime configuration: Resolve an issue where model and worker settings defined in Helm values.yaml could be ignored because the deployed configuration path did not match the runtime config generation flow.
Fix RT-DETRV2 torch.compile behavior for dynamic inputs: Enable dynamic=True when compiling RT-DETRV2 to better support variable input shapes.
Fix IG model archive dependencies: Include interest_zone.py in the IG model archive configuration so interest-zone logic is available at runtime.
Add polygon-based interest handling: Introduce polygon foot-point support and apply the new interests interface across handlers and proxy models for more precise region-based filtering.
Implement threshold normalization and denormalization: Add support for converting threshold values between normalized and denormalized forms in the proxy service layer.
Add configurable boundary truncation controls: Introduce boundary truncation checks and modes (off / only_flag / on) to better control postprocessing behavior when detections touch image edges.
Add group-based VitPose verification and pipeline options: Extend the verification pipeline with group-based VitPose processing and configurable execution options.
Fix VitPose verification execution on response parsing mismatch: Resolve an issue where VitPose verification could be skipped because of a response format mismatch in the RT-DETR v2 handler.
Update MODEL_RESOURCE_PROFILES with benchmark results: Set camera-to-worker mappings, GPU memory per worker, and max_effective_cameras constraints based on testing with camera configurations.
Update dynamic worker scaling configs: Refine scaling parameters for improved resource allocation accuracy under production workloads.
Apply Vision Caching for MMGDino & LLMDet: Enable vision caching to reduce redundant computation and improve inference throughput.
Filter GPUs by CUDA_VISIBLE_DEVICES in get_gpu_info(): Fix GPU detection to respect CUDA_VISIBLE_DEVICES environment variable, ensuring correct GPU count and resource monitoring in multi-GPU hosts.
Add MM-Grounding-DINO (Swin-B) model: Introduce MM-Grounding DINO with Swin-B backbone for enhanced zero-shot object detection capabilities.
Implement App Camera Management & TorchServe management wrapping API: Add new API interfaces for managing camera connections and TorchServe model lifecycle, providing centralized control over vision pipeline components.
Add batch camera API: Implement batch processing capabilities for multiple camera streams, enabling efficient handling of concurrent camera inputs.
Add model worker management APIs: Implement GET models/workers/current endpoint to check current worker count per model and PUT models/{model_name}/workers endpoint for manual worker scaling, enabling fine-grained control over model serving resources.
Update LLMDet to Swin-B backbone: Upgrade LLMDet model from previous backbone to Swin-B for improved detection accuracy.
Fix LLMDet text prefix and crop region: Correct text prompt prefix formatting and fix crop area boundary issue (ymin: 1 → ymin: 0) for proper detection coverage.
Optimize RT-DETRv2 compilation: Specify device type explicitly and update RT-DETRv2 compile mode to reduce-overhead for better inference performance.
Enhance model compilation stability: Add dummy forward pass to verify torch.compile functionality; automatically fall back to original model if compilation fails; set compile mode to default for improved reliability.
Implement model warmup script: Add dedicated warmup functionality to optimize initial inference latency and ensure models are ready for production workloads.
Adjust intersection ratio threshold: Increase intersection ratio from 0.5 to 0.95 for more precise detection overlap filtering.
Centralize port configuration with environment variables: Implement environment-based port management for improved deployment flexibility and configuration management across different environments.
Improve resource management: Refactor resource allocation and management for better system stability and performance.
Add RT-DETR V2 with VitPose model: Introduce new human detection model combining RT-DETR V2 object detection with VitPose pose estimation for human-only detection scenarios.
Add new API interface: Implement updated API interface compatible with EVA App v2.3.0, providing enhanced integration capabilities.
Add Kubernetes deployment support: Implement Helm chart configuration for containerized deployment on Kubernetes clusters, enabling scalable and managed production deployments.
Add foreground/background execution modes for proxy server: Introduce --foreground (-f) flag to run the proxy server in foreground mode with terminal output, while maintaining background mode with nohup as the default behavior for production deployments.
Add granular TorchServe log viewing: Implement individual log file viewing commands (access, errors, metrics, model, warnings) for more targeted debugging and monitoring of specific TorchServe components.
Add enhanced log viewer with file metadata: Display log file sizes, rotation status, and structured output for all TorchServe logs including access, errors, model metrics, model operations, and warnings.
Implement automatic absolute path resolution for log4j configuration: Add update_config_with_absolute_paths() function to dynamically convert relative log4j2.xml paths to absolute paths, ensuring proper logging regardless of execution context or working directory.
Enhance deployment script debugging capabilities: Add comprehensive debug output including config file changes, TorchServe startup command, periodic health checks during startup, and automatic error log display on startup failures.
Optimize Docker image for production readiness: Install essential system utilities (lsof, net-tools, procps) required by deployment script for port monitoring, process management, and service health checks.
Implement log retention strategy with automatic rotation: Configure separate retention policies for different log types (errors: 30 days, warnings: 7 days, access/metrics: 3 days) with gzip compression, reducing storage requirements by 90% while maintaining critical operational data.
Add intelligent metric sampling: Implement 6 logs/second sampling for MODEL_METRICS and TS_METRICS, capturing inference latency and performance metrics while reducing log volume by ~80% in high-traffic environments.
Optimize proxy server logging: Implement 1% sampling for INFO-level logs while preserving all ERROR/WARNING logs, reducing proxy log volume by 99% under high load (30+ req/sec).
Update Dockerfile to run services in foreground mode: Modify CMD to use --foreground flag, enabling proper Docker log streaming and preventing container exit while maintaining service availability.
Fix CUDA processor initialization: Modify processor to execute on CUDA devices instead of CPU, significantly reducing CPU usage and improving inference performance.
Fix config file handling for multiple path patterns: Update sed patterns to handle various log4j path formats (file:///, file://, relative paths) ensuring robust configuration file processing across different environments.
Improve temp config file cleanup: Add explicit cleanup of temporary .properties.tmp files in clean command and error handling paths to prevent accumulation of temporary configuration files.
Fix log4j2.xml configuration loading: Resolve path resolution issues by using vmargs=-Dlog4j.configurationFile with absolute paths and replacing ${sys:log_location} with direct logs path references.
Add system utilities to Docker image: Install lsof, net-tools, and procps packages to support deployment script's port checking, process management, and service monitoring capabilities in containerized environments.
Add new detection models for enhanced object detection capabilities: Integrate OmDet-Turbo and LLMDet models for improved open vocabulary zero-shot object detection.
Add flexible model endpoint routing: Updated proxy server to support dynamic model-specific endpoints (changed from predictions/Owl-v2 to predictions/\{model_name\}) to accommodate various model prediction methods.
Add comprehensive log management commands: Introduce new logs, status, and clean commands for easier service monitoring and maintenance.
logs command: View service logs with configurable line count (e.g., ./run.sh logs proxy -n 100)
status command: Check running services, port status, and disk usage
clean command: Remove old rotated logs and temporary files (e.g., ./run.sh clean --days 30)
Implement automatic log rotation with retention policy: Configure Log4j2-based log rotation with 15-day retention, daily rotation, 100MB size limit, and automatic .gz compression for all TorchServe logs (access, model, service, metrics).
Enhance service cleanup reliability: Improve stop_services() function to handle partial service states gracefully with specific pattern matching (python.*/proxy/main.py) and health checks before stopping services.
Fix deployment script cleanup behavior: Correct cleanup trap to preserve background services on normal exit and only trigger cleanup on errors, preventing unintended service termination.
Improve log configuration for Docker compatibility: Update eva_ts/config.properties and create eva_ts/log4j2.xml with relative paths to ensure proper operation in containerized environments.
TorchServe Migration: Migrated from ALO ML framework to TorchServe for production-grade model serving with improved reliability and scalability.
Real-Time HTTP-Based Inference: Replaced file-based API communication with real-time HTTP-based inference endpoints for faster and more efficient processing.
Unified API Endpoint: Introduced a FastAPI-based proxy server that consolidates TorchServe's multiple ports (inference, management, and metrics) into a single unified interface while maintaining consistent API endpoint paths.
Optimized OWLv2 Model Handlers: Separated OWLv2 model into dedicated handlers with zero-shot detection and image-guided detection split into independent handlers for optimized batch inference.
Few-Shot Learning Support: Added few-shot learning capabilities through the image-guided detection handler.
Project Architecture Restructured: Reorganized project architecture with clear separation between TorchServe handlers and proxy server components for improved maintainability and scalability.
Utility Modules Streamlined: Optimized utility modules for better reusability across handlers and cleaner codebase organization.
Model Workflow Improved: Enhanced model download and packaging workflow with dedicated scripts for more efficient model management.
LLMDet and OMDet-Turbo Temporarily Removed: Temporarily removed LLMDet and OMDet-Turbo models for migration to TorchServe architecture. These models will be reintroduced in version 1.1.0 with TorchServe support.
Add new detection models for enhanced object detection capabilities: Integrate LLMDet and OMDet-Turbo models for open vocabulary zero-shot object detection.
Remove YOLOE model from the supported model list: Due to licensing issues, YoloE has been excluded from the supported models.
Python 3.10 Compatibility Fixed: Resolved compatibility issues with Python 3.10 to ensure proper functionality across supported Python versions.
Ultralytics Dependencies Removed: Removed ultralytics dependencies to resolve version conflicts and improve package stability.
Bounding Box Handling Error Fixed: Fixed an error in bbox handling that occurred when processing multiple bounding boxes during few-shot learning operations.
Inference Dataset Folder Created: Added inference_dataset folder to establish a consistent inference workflow.