Run Voxtral transcription worker with two jobs
This commit is contained in:
@@ -60,6 +60,11 @@ AI-server compose snippet for Voxtral lives in
|
|||||||
- Start Voxtral:
|
- Start Voxtral:
|
||||||
`docker compose -f docker-compose.yml -f docker-compose.audio.yml --profile voxtral-small up -d voxtral-small`
|
`docker compose -f docker-compose.yml -f docker-compose.audio.yml --profile voxtral-small up -d voxtral-small`
|
||||||
|
|
||||||
|
In Kubernetes the dedicated transcription worker may claim more than one
|
||||||
|
`voxtral-small` job at a time. This keeps download/upload/wait overhead from
|
||||||
|
serializing the queue while Voxtral/vLLM still controls the actual GPU
|
||||||
|
scheduling.
|
||||||
|
|
||||||
## API
|
## API
|
||||||
|
|
||||||
- `POST /api/v1/jobs` creates one job.
|
- `POST /api/v1/jobs` creates one job.
|
||||||
|
|||||||
@@ -100,7 +100,7 @@ spec:
|
|||||||
- name: WORKER_MODEL_PROFILES
|
- name: WORKER_MODEL_PROFILES
|
||||||
value: "voxtral-small"
|
value: "voxtral-small"
|
||||||
- name: WORKER_CLAIM_LIMIT
|
- name: WORKER_CLAIM_LIMIT
|
||||||
value: "1"
|
value: "2"
|
||||||
envFrom:
|
envFrom:
|
||||||
- configMapRef:
|
- configMapRef:
|
||||||
name: ai-service-config
|
name: ai-service-config
|
||||||
|
|||||||
Reference in New Issue
Block a user