Skip to main content

EVA Agent Dependencies

This is an installation guide for the Dependencies required to run EVA Agent.

Before installing EVA Agent, you must first set up the infrastructure for Qdrant (vector DB) for data storage and vLLM for model inference.

This guide explains how to install the foundational dependency packages for EVA.




Understanding the Installation Structure

To ensure a successful installation, please check the dependencies and the required order between packages.

  1. eva-agent-init: Defines the Storage Class. (Must be installed first)
  2. qdrant / vllm: Uses the storage defined above to store data.
  3. eva-agent: Installed last, after the above services are fully prepared.



Prerequisites

Please verify that the required CLI tools are installed.

  • kubectl: Cluster control tool Installation

  • helm: Package management tool Installation

  • kustomize: Configuration customization tool (required for post-rendering)

    # Install kustomize binary (Linux)
    curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash
    chmod +x kustomize
    sudo mv kustomize /usr/local/bin/
    kustomize version
  • On-premise setup (not required when using cloud services such as AWS or NCP)

    • Install k3s

      curl -sfL https://get.k3s.io | sudo sh -s - --docker
      mkdir -p $HOME/.kube
      sudo cp /etc/rancher/k3s/k3s.yaml $HOME/.kube/config
      sudo chown $(id -un):$(id -gn) $HOME/.kube/config
      kubectl version
    • Install NFS CSI Driver

      helm repo add csi-driver-nfs https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts
      helm repo update
      helm install csi-driver-nfs csi-driver-nfs/csi-driver-nfs --namespace kube-system --version v4.11.0
    • Install NFS server & expose the directory

      sudo apt update
      sudo apt install nfs-kernel-server -y

      NFS_SHARE_PATH=/data001/share/eva-agent
      sudo mkdir -p ${NFS_SHARE_PATH}

      # Create cache directories for EVA Agent / vLLM
      # Set ownership/permissions on the NFS server
      sudo mkdir -p ${NFS_SHARE_PATH}/agent-cache ${NFS_SHARE_PATH}/vllm-cache
      sudo chown -R 10001:10001 ${NFS_SHARE_PATH}
      sudo chmod -R 0775 ${NFS_SHARE_PATH}

      # Allow NFS share only for localhost (assumes single-node k3s)
      # Change the IP address if you need to share to a different node
      echo "${NFS_SHARE_PATH} 127.0.0.1(rw,sync,no_subtree_check,root_squash,anonuid=10001,anongid=10001)" | sudo tee -a /etc/exports

      sudo exportfs -ra
      sudo systemctl restart nfs-kernel-server

      # test
      showmount -e localhost
      sudo mkdir /mnt/tmp && sudo mount -t nfs -o rw,nfsvers=4 localhost:${NFS_SHARE_PATH} /mnt/tmp
      sudo umount /mnt/tmp



Register and Update Helm Repositories

Register all required open-source repositories and keep them up to date. Also set up the namespace for EVA Agent installation in advance.

# 1. Add each repository
helm repo add qdrant https://qdrant.github.io/qdrant-helm
helm repo add eva-agent https://mellerikat.github.io/eva-agent

# 2. Update to the latest information
helm repo update

# 3. Create the namespace/service account for EVA Agent
kubectl create namespace eva-agent
kubectl create serviceaccount sa-eva-agent -n eva-agent



Step 1: Install eva-agent-init

This package is a critical step that pre-defines a common Storage Class so that Qdrant and vLLM (installed later) can smoothly store data.

  • Package role:
    • It configures a dedicated storage class to ensure eva-agent-vllm and eva-agent-qdrant can store data safely.
    • Therefore, it must be installed first, before any other packages.

Download Configuration Files (eva-agent-init)

Download the configuration files above from the GitHub repository. Choose the file that matches your environment.

Values templates are organized by EVA Agent version (image tag). Helm charts and images use separate versions.

# Example: download k3s values
RELEASE_VERSION="2.6.0"
BASE_URL="https://raw.githubusercontent.com/mellerikat/eva-agent/chartmuseum/release/2.6.0"

mkdir -p eva-agent-init
curl -L \
"$BASE_URL/eva-agent-init/values-k3s.yaml" \
-o eva-agent-init/values-k3s.yaml

If you use k3s + NFS, the share directory in values-k3s.yaml must match the mount directory on the NFS server (e.g., {NFS_SHARE_PATH}). Model caches can require tens of GB (or more), so use an NFS mount path on a disk with sufficient capacity (recommended: a data disk mount if available).

# (Optional) k3s + NFS: set share path to your NFS mount directory
NFS_SHARE_PATH="/data001/share/eva-agent"

# Update storageClass.fileSystem.parameters.share
sed -i "s|^[[:space:]]*share:.*| share: $NFS_SHARE_PATH|" eva-agent-init/values-k3s.yaml
# Example: install for k3s environment
helm install eva-agent-init eva-agent/eva-agent-init \
--version=1.0.0 \
-n eva-agent \
-f eva-agent-init/values-k3s.yaml



Step 2: Configure eva-agent-qdrant values

Install the Qdrant DB to store vector data.

💡 Note: PVCs/PVs are created using the Storage Class defined in eva-agent-init. Existing PVs are preserved during reinstallation, but you must use kubectl delete for manual cleanup if you wish to remove them entirely.

Update Settings for Your Environment

Deployment environment
CategoryNameDescriptionValue
GeneralnameOverrideChart name overrideeva-agent-qdrant
GeneralfullnameOverrideFull resource name override""
ServiceAccountserviceAccount.createCreate ServiceAccountfalse
ServiceAccountserviceAccount.nameServiceAccount namesa-eva-agent
Imageimage.pullPolicyImage pull policyAlways
Storagepersistence.accessModesPVC access modes["ReadWriteOnce"]
Storagepersistence.sizePVC size allocated to Qdrant10Gi
Storagepersistence.annotationsAnnotations for PVC/PV (add if needed){}
Storagepersistence.storageVolumeName(Depends on environment) PV/volume identifier nameeva-agent-qdrant-storage
Storagepersistence.storageClassNameStorageClass nameeva-agent-sc-bs
SnapshotsnapshotPersistence.enabledEnable dedicated snapshot PVC persistencetrue
SnapshotsnapshotPersistence.accessModesSnapshot PVC access modes["ReadWriteOnce"]
SnapshotsnapshotPersistence.sizeSnapshot PVC size10Gi
SnapshotsnapshotPersistence.annotationsAnnotations for snapshot PVC/PV{}
SnapshotsnapshotPersistence.snapshotsVolumeNamePV/volume identifier for snapshot storageeva-agent-qdrant-snapshots
SnapshotsnapshotPersistence.storageClassNameStorageClass name for snapshot PVCeva-agent-sc-bs
SnapshotsnapshotRestoration.enabledOptional advanced setting to mount an external restore-source PVC (commented by default)false (commented)
SnapshotsnapshotRestoration.pvcNameExternal restore-source PVC name (optional, commented by default)qdrant-snapshot-restore-pvc
SnapshotsnapshotRestoration.mountPathMount path for external restore source (optional, commented by default)/qdrant/snapshot-restoration
SnapshotsnapshotRestoration.snapshotsSnapshot file list to restore (optional, commented by default)[]
SchedulingnodeSelectorNode label selector (schedule only on specific nodes)
nodeSelector:
node.kubernetes.io/instance-type: k3s
beta.kubernetes.io/os: linux

Download Configuration Files (eva-agent-qdrant)

Download the values and post-renderer plugin templates for Qdrant.

RELEASE_VERSION="2.6.0"
BASE_URL="https://raw.githubusercontent.com/mellerikat/eva-agent/chartmuseum/release/2.6.0"

# Prepare directories
mkdir -p eva-agent eva-agent-init eva-agent-qdrant eva-agent-vllm \
plugin/eva-agent-qdrant

# Qdrant values & plugin
curl -L "$BASE_URL/eva-agent-qdrant/values.yaml" -o eva-agent-qdrant/values.yaml
curl -L "$BASE_URL/eva-agent-qdrant/values-aws.yaml" -o eva-agent-qdrant/values-aws.yaml
curl -L "$BASE_URL/eva-agent-qdrant/values-ncp.yaml" -o eva-agent-qdrant/values-ncp.yaml
curl -L "$BASE_URL/plugins/eva-agent-qdrant/post-renderer.sh" -o plugin/eva-agent-qdrant/post-renderer.sh
curl -L "$BASE_URL/plugins/eva-agent-qdrant/plugin.yaml" -o plugin/eva-agent-qdrant/plugin.yaml

This installation guide does not include restore procedures. Use the value settings below as operational references:

  • snapshotPersistence.enabled
    • Keep true so /qdrant/snapshots is persisted on PVC.
  • snapshotRestoration.enabled
    • Optional advanced setting to mount an external PVC as a recovery source path.
  • snapshotRestoration.pvcName
    • External PVC name used only when snapshotRestoration.enabled=true.
  • snapshotRestoration.mountPath
    • Mount path for external recovery source files.
  • args
    • Keep chart default startup command: ["./config/initialize.sh"].

Qdrant and vLLM installation will be performed together in the script execution step below.




Step 3: Configure eva-agent-vllm values

Install vLLM, the model inference server. (Agent image version 2.2-a2.0 or later is required)

  • values templates: Available in the GitHub repository.
  • Chart source: This guide installs the custom chart eva-agent/eva-agent-vllm from the eva-agent Helm repository.
  • Detailed value descriptions: Use the release templates in the eva-agent-vllm directory first; Artifact Hub vLLM-stack can be used as an optional engine-level reference.
  • Key setting: For smooth dynamic allocation of PVC/PV, make sure nodeSelector is configured correctly.

💡 Note: PVCs/PVs are created using the Storage Class defined in eva-agent-init. Existing PVs are preserved during reinstallation, but you must use kubectl delete for manual cleanup if you wish to remove them entirely.


Update Settings for Your Environment

Select deployment environment / service / detail group in the filter below to show only the relevant settings.

Deployment environment
Router / Model Engine
Detail group
CategoryNameDescriptionValueNotes
RouterrouterSpec.enableRouterEnable Model RoutertrueSee config guide
RouterrouterSpec.routingLogicRouting strategy (roundrobin or session; session sticks by key, otherwise lowest QPS)"roundrobin"
RouterrouterSpec.serviceDiscoveryBackend discovery mode"static"See config guide
RouterrouterSpec.staticBackendsStatic backend endpoints"http://eva-agent-vllm-qwen3-vl-8b-fp8-engine-service.eva-agent.svc.cluster.local,
http://external.vllm.ip:port"
See config guide
RouterrouterSpec.staticModelsStatic backend model mapping"qwen3-vl-8b-instruct-fp8,Exaone4.0"See config guide
RouterrouterSpec.serviceTypeRouter Service type"ClusterIP"Default is no external exposure
RouterrouterSpec.servicePortRouter Service port80
RouterrouterSpec.containerPortRouter container port8000

Expand value configuration guide
  • routerSpec.enableRouter
    • The default is always enabled. In production, we recommend keeping it true unless there is a specific reason to disable it.
    • The Router does not create new engines; it routes requests only to existing vLLM endpoints.
    • These endpoints can be engine pods deployed together in the same Helm release, or a different cluster / a different deployment.
    • They do not have to be Kubernetes pods. Any HTTP-reachable vLLM endpoint works: Docker containers, bare-metal processes, Python servers, etc.
    • If you run multiple models on a single node, or if engines are distributed, the Router is effectively required.
    • In short, the Router is a reverse proxy that groups “vLLM endpoints accessible anywhere” into a single entrypoint and distributes traffic.
  • routerSpec.serviceDiscovery
    • Supported values: k8s, static.
    • The recommended value is static. Even if you use only the same cluster, static works and results in a predictable production setup.
    • Configure vLLM engines to be exposed via a non-headless Service (ClusterIP/NodePort).
      Since the internal DNS acts as the primary entrypoint, you can track endpoints using static even within the same cluster.
      From an ops perspective, there is no major difference compared to k8s; it is mainly a management preference.
    • k8s is advantageous in environments where endpoints change frequently (autoscaling, frequent rolling). It follows Endpoints automatically.
    • static is suitable when routing targets are clearly fixed or when you need to include external endpoints. When targets change, you must update the list manually.
  • routerSpec.routingLogic
    • roundrobin: Evenly distributes in list order.
    • session: Sticky routing by session key; if no key is provided, it selects the engine with the lowest QPS.
    • This is not resource-based routing. If you need weighted distribution, repeat the target URL in staticBackends (e.g., 6:1).
  • routerSpec.staticBackends
    • A single-line string separated by commas (,).
    • If you are in the same cluster, do not use a Headless service. Use a ClusterIP/NodePort service as the entrypoint.
    • Service DNS is constructed from the model name:
      • http://eva-agent-vllm-<servingEngineSpec.modelSpec[].name>-engine-service.<namespace>.svc.cluster.local
      • Example (model name qwen3-vl-8b-fp8): http://eva-agent-vllm-qwen3-vl-8b-fp8-engine-service.eva-agent.svc.cluster.local
    • Always include the namespace. The Router URL validation rejects hostnames without a dot (.) as invalid.
    • External endpoint example: http://203.0.113.10/v1
  • routerSpec.staticModels
    • List model names separated by commas (,) in the same order as staticBackends.
    • The model name must exactly match the --served-model-name value in servingEngineSpec.modelSpec[].vllmConfig.extraArgs.
    • If unsure, check the running endpoint with curl <endpoint>/v1/models (pod/container/external server all work).
      • The model name is the id field in the response. Example: {"object":"list","data":[{"id":"qwen3-vl-8b-instruct-fp8"...}]}.
  • servingEngineSpec.modelSpec[].name
    • This value is used to generate the service DNS for routerSpec.staticBackends.
    • We recommend not changing it for the same model. (If you change the model, the Hugging Face repo changes, so the model name must change as well.)
  • servingEngineSpec.modelSpec[].modelURL
    • Enter the Hugging Face repo path of the model to serve. (Example: Qwen/Qwen3-VL-8B-Instruct-FP8)
  • servingEngineSpec.modelSpec[].vllmConfig.tensorParallelSize
    • Meaning: the number of GPUs used by one model (engine pod).
    • Overhead: as TP increases, GPU-to-GPU communication cost increases. Set it to the minimum value that satisfies memory/throughput needs.
    • Constraint: tensorParallelSize must divide the model’s number of attention heads (num_attention_heads).
    • See Server-specific deployment guide for detailed setups.
  • servingEngineSpec.modelSpec[].vllmConfig.gpuMemoryUtilization
    • Controls the upper bound of GPU memory usage.
    • Do not set it to 1.0. KV cache does not include embedding memory for prefix/multimodal inputs, so you need headroom.
    • Use a higher value when the GPU is dedicated (default 0.9), and a lower value when sharing with EVA Vision.
    • For a 24GB GPU, we recommend starting in the 0.8 to 0.85 range.
    • See Server-specific deployment guide for concrete examples.
  • servingEngineSpec.modelSpec[].vllmConfig.extraArgs
    • --served-model-name must match staticModels.
    • kv-cache-dtype depends on GPU architecture; for pre-Ada GPUs, use auto.
  • servingEngineSpec.modelSpec[].requestCPU
    • CPU is independent of GPU. Allocate enough for tokenizer/workers/I/O.
    • If co-located on the same node as EVA Vision, leave CPU headroom for the EVA Vision pod as well.
  • servingEngineSpec.modelSpec[].requestMemory
    • This is system RAM (not VRAM). If too low, you can OOM even if GPU is sufficient.
    • If KV cache pressure is high, keep maxModelLen at 12K and increase RAM or scale up GPU/TP.
  • servingEngineSpec.modelSpec[].requestGPU
    • Important: specifying requestGPU makes Kubernetes allocate the GPU exclusively.
    • If you need GPU sharing, comment out requestGPU and tune with gpuMemoryUtilization / maxModelLen.
    • If EVA Vision uses only part of the GPU, you can run multiple engines with fractional GPU only when MIG/time-slicing is enabled. (If disabled, only integer GPUs are supported.)
    • For dedicated use, keep requestGPU as an integer and tune with gpuMemoryUtilization. Values like 0.9 cannot be used for requestGPU (except MIG/time-slicing).
    • See Server-specific deployment guide for server-specific configurations.
Expand server-specific deployment guide
Server-specific deployment guide
  • Organize settings based on the actual servers where EVA is deployed.
  • For all cases below, the order of routerSpec.staticBackends and routerSpec.staticModels must match.
  • Environment
    • Share GPU with EVA Vision
    • Connect one additional external vLLM pod (Router routes Qwen3‑VL 8B & Exaone 4.0)
  • Required settings
    • routerSpec.enableRouter: true
    • routerSpec.serviceDiscovery: "static"
    • routerSpec.staticBackends: comma-separated internal engine + external engine URL
      • Internal engine example: http://eva-agent-vllm-qwen3-vl-8b-fp8-engine-service.<namespace>.svc.cluster.local
      • External engine example: http://203.0.113.10/v1
    • routerSpec.staticModels: qwen3-vl-8b-instruct-fp8,Exaone4.0
    • servingEngineSpec.modelSpec[].name: qwen3-vl-8b-fp8
    • servingEngineSpec.modelSpec[].vllmConfig.extraArgs: --served-model-name qwen3-vl-8b-instruct-fp8, --kv-cache-dtype auto
    • servingEngineSpec.modelSpec[].vllmConfig.tensorParallelSize: 1
    • servingEngineSpec.modelSpec[].requestGPU: comment out (share GPU with EVA Vision)
    • servingEngineSpec.modelSpec[].vllmConfig.gpuMemoryUtilization: 0.7 (assuming EVA Vision shares 10GB)
    • servingEngineSpec.modelSpec[].vllmConfig.maxModelLen: 12288
  • Checkpoints
    • For the external engine, verify that /v1/models id is Exaone4.0
    • routerSpec.staticBackends only allows URLs that contain a dot (.) (service/domain must be explicit)
    • Calculation: 48GB - 10GB(EVA Vision) = 38GB → 38/48 ≈ 0.79, set to 0.7 for operational headroom
    • For Ampere GPUs, use --kv-cache-dtype auto

Download Configuration Files (eva-agent-vllm)

Download the values templates for vLLM. Among the templates below, common values.yaml and values-k3s.yml are based on an Nvidia A6000 GPU server, and values-aws.yaml is based on a server with Nvidia L40s GPU x1. You must adjust vLLM-related settings based on the table above and your GPU server specifications.

RELEASE_VERSION="2.6.0"
BASE_URL="https://raw.githubusercontent.com/mellerikat/eva-agent/chartmuseum/release/2.6.0"

# vLLM values
curl -L "$BASE_URL/eva-agent-vllm/values.yaml" -o eva-agent-vllm/values.yaml
curl -L "$BASE_URL/eva-agent-vllm/values-k3s.yaml" -o eva-agent-vllm/values-k3s.yaml
curl -L "$BASE_URL/eva-agent-vllm/values-aws.yaml" -o eva-agent-vllm/values-aws.yaml
curl -L "$BASE_URL/eva-agent-vllm/values-ncp.yaml" -o eva-agent-vllm/values-ncp.yaml

Step 4: Install Qdrant and vLLM

After updating Qdrant and vLLM values to match your deployment server, install them using the script.

Script options (tags)

You can pass the options below as needed.

  • --namespace <ns>: installation namespace (default: eva-agent)
  • --release <ver>: release version (default: 2.6.0)
  • --base-dir <dir>: base directory for values/plugin directories (default: current directory)
  • --qdrant-chart-version <ver>: Qdrant chart version (default: 1.16.3)
  • --vllm-chart-version <ver>: vLLM chart version (default: 0.1.8)
  • --qdrant-values <file>: additional Qdrant values file (can be used multiple times)
  • --vllm-values <file>: additional vLLM values file (can be used multiple times)

For the full option list/defaults, run ./install_eva_agent_dependencies.sh --help.

# Download the install script
curl -L "https://raw.githubusercontent.com/mellerikat/eva-agent/chartmuseum/install_eva_agent_dependencies.sh" \
-o install_eva_agent_dependencies.sh
chmod +x install_eva_agent_dependencies.sh

# Run after editing values (Qdrant uses a post-renderer; vLLM uses the custom chart directly)
# k3s example
./install_eva_agent_dependencies.sh \
--qdrant-values eva-agent-qdrant/values-k3s.yaml \
--vllm-values eva-agent-vllm/values-k3s.yaml

# AWS example
./install_eva_agent_dependencies.sh \
--qdrant-values eva-agent-qdrant/values-aws.yaml \
--vllm-values eva-agent-vllm/values-aws.yaml

# NCP example
./install_eva_agent_dependencies.sh \
--qdrant-values eva-agent-qdrant/values-ncp.yaml \
--vllm-values eva-agent-vllm/values-ncp.yaml