Run Voxtral transcription worker with two jobs
Some checks failed
CI / test (push) Failing after 9s
Build and Deploy / build-and-deploy (push) Successful in 19s

This commit is contained in:
Grendgi
2026-06-09 17:16:24 +03:00
parent 9bd6d726f0
commit e074f6b226
2 changed files with 6 additions and 1 deletions

View File

@@ -60,6 +60,11 @@ AI-server compose snippet for Voxtral lives in
- Start Voxtral:
`docker compose -f docker-compose.yml -f docker-compose.audio.yml --profile voxtral-small up -d voxtral-small`
In Kubernetes the dedicated transcription worker may claim more than one
`voxtral-small` job at a time. This keeps download/upload/wait overhead from
serializing the queue while Voxtral/vLLM still controls the actual GPU
scheduling.
## API
- `POST /api/v1/jobs` creates one job.

View File

@@ -100,7 +100,7 @@ spec:
- name: WORKER_MODEL_PROFILES
value: "voxtral-small"
- name: WORKER_CLAIM_LIMIT
value: "1"
value: "2"
envFrom:
- configMapRef:
name: ai-service-config