Run Voxtral transcription worker with two jobs
This commit is contained in:
@@ -60,6 +60,11 @@ AI-server compose snippet for Voxtral lives in
|
||||
- Start Voxtral:
|
||||
`docker compose -f docker-compose.yml -f docker-compose.audio.yml --profile voxtral-small up -d voxtral-small`
|
||||
|
||||
In Kubernetes the dedicated transcription worker may claim more than one
|
||||
`voxtral-small` job at a time. This keeps download/upload/wait overhead from
|
||||
serializing the queue while Voxtral/vLLM still controls the actual GPU
|
||||
scheduling.
|
||||
|
||||
## API
|
||||
|
||||
- `POST /api/v1/jobs` creates one job.
|
||||
|
||||
Reference in New Issue
Block a user