Add AI server audio model profiles

2026-06-09 12:50:56 +03:00
parent aaecbb1bed
commit f49ba7abd5
3 changed files with 128 additions and 4 deletions
--- a/README.md
+++ b/README.md
@@ -62,6 +62,18 @@ Qwen2-Audio and Voxtral are called through an OpenAI-compatible
 `/v1/chat/completions` endpoint with `input_audio`; set their endpoint URLs only
 after the models are actually exposed on the AI server.

+AI-server compose snippets for these temporary comparison endpoints live in
+`deploy/ai-server/docker-compose.audio.yml`. They are profile-gated because the
+single GPU cannot keep the production text vLLM, two WhisperX instances, Qwen2
+Audio and Voxtral loaded at the same time:
+
+- Qwen2-Audio endpoint: `http://10.2.3.5:8003`
+- Voxtral endpoint: `http://10.2.3.5:8004`
+- Start Qwen only:
+  `docker compose -f docker-compose.yml -f docker-compose.audio.yml --profile qwen-audio up -d qwen-audio`
+- Start Voxtral only:
+  `docker compose -f docker-compose.yml -f docker-compose.audio.yml --profile voxtral-small up -d voxtral-small`
+
 ## API

 - `POST /api/v1/jobs` creates one job.
@@ -104,10 +116,12 @@ for Kubernetes probes.
 - `WHISPERX_URL`, WhisperX endpoint for transcription jobs
 - `QWEN_AUDIO_BASE_URL`, OpenAI-compatible endpoint for Qwen2-Audio
 - `QWEN_AUDIO_MODEL`, default `Qwen/Qwen2-Audio-7B-Instruct`
- `QWEN_AUDIO_API_KEY`, optional bearer token for Qwen2-Audio
+- `QWEN_AUDIO_API_KEY`, optional bearer token for Qwen2-Audio; falls back to
+  `AUDIO_LLM_API_KEY`, then `LLM_API_KEY`
 - `VOXTRAL_BASE_URL`, OpenAI-compatible endpoint for Voxtral
 - `VOXTRAL_MODEL`, default `mistralai/Voxtral-Small-24B-2507`
- `VOXTRAL_API_KEY`, optional bearer token for Voxtral
+- `VOXTRAL_API_KEY`, optional bearer token for Voxtral; falls back to
+  `AUDIO_LLM_API_KEY`, then `LLM_API_KEY`
 - `AUDIO_LLM_PROMPT`, transcription instruction for audio LLM providers
 - `AUDIO_LLM_MAX_TOKENS`, default `4096`
 - `WORKER_ID`, default hostname