Update MODEL_RESOURCE_PROFILES with benchmark results: Set camera-to-worker mappings, GPU memory per worker, and max_effective_cameras constraints based on testing with camera configurations.
Update dynamic worker scaling configs: Refine scaling parameters for improved resource allocation accuracy under production workloads.
Apply Vision Caching for MMGDino & LLMDet: Enable vision caching to reduce redundant computation and improve inference throughput.
Filter GPUs by CUDA_VISIBLE_DEVICES in get_gpu_info(): Fix GPU detection to respect CUDA_VISIBLE_DEVICES environment variable, ensuring correct GPU count and resource monitoring in multi-GPU hosts.
Add MM-Grounding-DINO (Swin-B) model: Introduce MM-Grounding DINO with Swin-B backbone for enhanced zero-shot object detection capabilities.
Implement App Camera Management & TorchServe management wrapping API: Add new API interfaces for managing camera connections and TorchServe model lifecycle, providing centralized control over vision pipeline components.
Add batch camera API: Implement batch processing capabilities for multiple camera streams, enabling efficient handling of concurrent camera inputs.
Add model worker management APIs: Implement GET models/workers/current endpoint to check current worker count per model and PUT models/{model_name}/workers endpoint for manual worker scaling, enabling fine-grained control over model serving resources.
Update LLMDet to Swin-B backbone: Upgrade LLMDet model from previous backbone to Swin-B for improved detection accuracy.
Fix LLMDet text prefix and crop region: Correct text prompt prefix formatting and fix crop area boundary issue (ymin: 1 → ymin: 0) for proper detection coverage.
Optimize RT-DETRv2 compilation: Specify device type explicitly and update RT-DETRv2 compile mode to reduce-overhead for better inference performance.
Enhance model compilation stability: Add dummy forward pass to verify torch.compile functionality; automatically fall back to original model if compilation fails; set compile mode to default for improved reliability.
Implement model warmup script: Add dedicated warmup functionality to optimize initial inference latency and ensure models are ready for production workloads.
Adjust intersection ratio threshold: Increase intersection ratio from 0.5 to 0.95 for more precise detection overlap filtering.
Centralize port configuration with environment variables: Implement environment-based port management for improved deployment flexibility and configuration management across different environments.
Improve resource management: Refactor resource allocation and management for better system stability and performance.
Add RT-DETR V2 with VitPose model: Introduce new human detection model combining RT-DETR V2 object detection with VitPose pose estimation for human-only detection scenarios.
Add new API interface: Implement updated API interface compatible with EVA App v2.3.0, providing enhanced integration capabilities.
Add Kubernetes deployment support: Implement Helm chart configuration for containerized deployment on Kubernetes clusters, enabling scalable and managed production deployments.
Add foreground/background execution modes for proxy server: Introduce --foreground (-f) flag to run the proxy server in foreground mode with terminal output, while maintaining background mode with nohup as the default behavior for production deployments.
Add granular TorchServe log viewing: Implement individual log file viewing commands (access, errors, metrics, model, warnings) for more targeted debugging and monitoring of specific TorchServe components.
Add enhanced log viewer with file metadata: Display log file sizes, rotation status, and structured output for all TorchServe logs including access, errors, model metrics, model operations, and warnings.
Implement automatic absolute path resolution for log4j configuration: Add update_config_with_absolute_paths() function to dynamically convert relative log4j2.xml paths to absolute paths, ensuring proper logging regardless of execution context or working directory.
Enhance deployment script debugging capabilities: Add comprehensive debug output including config file changes, TorchServe startup command, periodic health checks during startup, and automatic error log display on startup failures.
Optimize Docker image for production readiness: Install essential system utilities (lsof, net-tools, procps) required by deployment script for port monitoring, process management, and service health checks.
Implement log retention strategy with automatic rotation: Configure separate retention policies for different log types (errors: 30 days, warnings: 7 days, access/metrics: 3 days) with gzip compression, reducing storage requirements by 90% while maintaining critical operational data.
Add intelligent metric sampling: Implement 6 logs/second sampling for MODEL_METRICS and TS_METRICS, capturing inference latency and performance metrics while reducing log volume by ~80% in high-traffic environments.
Optimize proxy server logging: Implement 1% sampling for INFO-level logs while preserving all ERROR/WARNING logs, reducing proxy log volume by 99% under high load (30+ req/sec).
Update Dockerfile to run services in foreground mode: Modify CMD to use --foreground flag, enabling proper Docker log streaming and preventing container exit while maintaining service availability.
Fix CUDA processor initialization: Modify processor to execute on CUDA devices instead of CPU, significantly reducing CPU usage and improving inference performance.
Fix config file handling for multiple path patterns: Update sed patterns to handle various log4j path formats (file:///, file://, relative paths) ensuring robust configuration file processing across different environments.
Improve temp config file cleanup: Add explicit cleanup of temporary .properties.tmp files in clean command and error handling paths to prevent accumulation of temporary configuration files.
Fix log4j2.xml configuration loading: Resolve path resolution issues by using vmargs=-Dlog4j.configurationFile with absolute paths and replacing ${sys:log_location} with direct logs path references.
Add system utilities to Docker image: Install lsof, net-tools, and procps packages to support deployment script's port checking, process management, and service monitoring capabilities in containerized environments.
Add new detection models for enhanced object detection capabilities: Integrate OmDet-Turbo and LLMDet models for improved open vocabulary zero-shot object detection.
Add flexible model endpoint routing: Updated proxy server to support dynamic model-specific endpoints (changed from predictions/Owl-v2 to predictions/\{model_name\}) to accommodate various model prediction methods.
Add comprehensive log management commands: Introduce new logs, status, and clean commands for easier service monitoring and maintenance.
logs command: View service logs with configurable line count (e.g., ./run.sh logs proxy -n 100)
status command: Check running services, port status, and disk usage
clean command: Remove old rotated logs and temporary files (e.g., ./run.sh clean --days 30)
Implement automatic log rotation with retention policy: Configure Log4j2-based log rotation with 15-day retention, daily rotation, 100MB size limit, and automatic .gz compression for all TorchServe logs (access, model, service, metrics).
Enhance service cleanup reliability: Improve stop_services() function to handle partial service states gracefully with specific pattern matching (python.*/proxy/main.py) and health checks before stopping services.
Fix deployment script cleanup behavior: Correct cleanup trap to preserve background services on normal exit and only trigger cleanup on errors, preventing unintended service termination.
Improve log configuration for Docker compatibility: Update eva_ts/config.properties and create eva_ts/log4j2.xml with relative paths to ensure proper operation in containerized environments.
TorchServe Migration: Migrated from ALO ML framework to TorchServe for production-grade model serving with improved reliability and scalability.
Real-Time HTTP-Based Inference: Replaced file-based API communication with real-time HTTP-based inference endpoints for faster and more efficient processing.
Unified API Endpoint: Introduced a FastAPI-based proxy server that consolidates TorchServe's multiple ports (inference, management, and metrics) into a single unified interface while maintaining consistent API endpoint paths.
Optimized OWLv2 Model Handlers: Separated OWLv2 model into dedicated handlers with zero-shot detection and image-guided detection split into independent handlers for optimized batch inference.
Few-Shot Learning Support: Added few-shot learning capabilities through the image-guided detection handler.
Project Architecture Restructured: Reorganized project architecture with clear separation between TorchServe handlers and proxy server components for improved maintainability and scalability.
Utility Modules Streamlined: Optimized utility modules for better reusability across handlers and cleaner codebase organization.
Model Workflow Improved: Enhanced model download and packaging workflow with dedicated scripts for more efficient model management.
LLMDet and OMDet-Turbo Temporarily Removed: Temporarily removed LLMDet and OMDet-Turbo models for migration to TorchServe architecture. These models will be reintroduced in version 1.1.0 with TorchServe support.
Add new detection models for enhanced object detection capabilities: Integrate LLMDet and OMDet-Turbo models for open vocabulary zero-shot object detection.
Remove YOLOE model from the supported model list: Due to licensing issues, YoloE has been excluded from the supported models.
Python 3.10 Compatibility Fixed: Resolved compatibility issues with Python 3.10 to ensure proper functionality across supported Python versions.
Ultralytics Dependencies Removed: Removed ultralytics dependencies to resolve version conflicts and improve package stability.
Bounding Box Handling Error Fixed: Fixed an error in bbox handling that occurred when processing multiple bounding boxes during few-shot learning operations.
Inference Dataset Folder Created: Added inference_dataset folder to establish a consistent inference workflow.