Add AI server audio model profiles
This commit is contained in:
18
README.md
18
README.md
@@ -62,6 +62,18 @@ Qwen2-Audio and Voxtral are called through an OpenAI-compatible
|
||||
`/v1/chat/completions` endpoint with `input_audio`; set their endpoint URLs only
|
||||
after the models are actually exposed on the AI server.
|
||||
|
||||
AI-server compose snippets for these temporary comparison endpoints live in
|
||||
`deploy/ai-server/docker-compose.audio.yml`. They are profile-gated because the
|
||||
single GPU cannot keep the production text vLLM, two WhisperX instances, Qwen2
|
||||
Audio and Voxtral loaded at the same time:
|
||||
|
||||
- Qwen2-Audio endpoint: `http://10.2.3.5:8003`
|
||||
- Voxtral endpoint: `http://10.2.3.5:8004`
|
||||
- Start Qwen only:
|
||||
`docker compose -f docker-compose.yml -f docker-compose.audio.yml --profile qwen-audio up -d qwen-audio`
|
||||
- Start Voxtral only:
|
||||
`docker compose -f docker-compose.yml -f docker-compose.audio.yml --profile voxtral-small up -d voxtral-small`
|
||||
|
||||
## API
|
||||
|
||||
- `POST /api/v1/jobs` creates one job.
|
||||
@@ -104,10 +116,12 @@ for Kubernetes probes.
|
||||
- `WHISPERX_URL`, WhisperX endpoint for transcription jobs
|
||||
- `QWEN_AUDIO_BASE_URL`, OpenAI-compatible endpoint for Qwen2-Audio
|
||||
- `QWEN_AUDIO_MODEL`, default `Qwen/Qwen2-Audio-7B-Instruct`
|
||||
- `QWEN_AUDIO_API_KEY`, optional bearer token for Qwen2-Audio
|
||||
- `QWEN_AUDIO_API_KEY`, optional bearer token for Qwen2-Audio; falls back to
|
||||
`AUDIO_LLM_API_KEY`, then `LLM_API_KEY`
|
||||
- `VOXTRAL_BASE_URL`, OpenAI-compatible endpoint for Voxtral
|
||||
- `VOXTRAL_MODEL`, default `mistralai/Voxtral-Small-24B-2507`
|
||||
- `VOXTRAL_API_KEY`, optional bearer token for Voxtral
|
||||
- `VOXTRAL_API_KEY`, optional bearer token for Voxtral; falls back to
|
||||
`AUDIO_LLM_API_KEY`, then `LLM_API_KEY`
|
||||
- `AUDIO_LLM_PROMPT`, transcription instruction for audio LLM providers
|
||||
- `AUDIO_LLM_MAX_TOKENS`, default `4096`
|
||||
- `WORKER_ID`, default hostname
|
||||
|
||||
Reference in New Issue
Block a user