Make Voxtral the only transcription provider

2026-06-09 16:54:54 +03:00
parent 5c965be8c9
commit 9bd6d726f0
15 changed files with 128 additions and 900 deletions
--- a/README.md
+++ b/README.md
@@ -15,7 +15,7 @@ The service is intentionally domain-agnostic:
  `beeline/{call_id}` or `channel/{message_id}`.
 - `task_type` describes the technical task class, for example
  `transcribe`, `call_analysis`, `tg_analysis`, `pf_competitor_analysis`.
- `model_profile` selects a runtime profile, for example `whisperx`,
+- `model_profile` selects a runtime profile, for example `voxtral-small`,
  `qwen2.5-14b`, `vision`, or a future provider profile.
 - `input` and `result` are JSON payloads owned by the caller and worker.

@@ -46,32 +46,18 @@ or compact `system` / `user` fields. The completed job result contains
 domain metadata fields in `input`, but the worker only reads chat fields such as
 `system`, `user`, `messages`, `max_tokens` and `response_format`.

-`transcription` jobs can run several transcription providers in order for
-temporary A/B comparison. The main `segments` field remains compatible with
-telephony and contains the first successful provider result. The full comparison
-is stored in `attempts` with `provider`, `model`, `status`, `text`, `segments`,
-`duration_ms` and `error`.
+`transcription` jobs are processed only by Voxtral Small
+(`mistralai/Voxtral-Small-24B-2507`) through an OpenAI-compatible
+`/v1/audio/transcriptions` endpoint. The returned `segments` field stays
+compatible with telephony. If the provider returns one long segment, AI Service
+splits it into smaller transcript segments and adds heuristic speaker labels
+when diarization is requested.

-Recommended comparison order:
+AI-server compose snippet for Voxtral lives in
+`deploy/ai-server/docker-compose.audio.yml`:

-1. `whisperx`
-2. `qwen2-audio` (`Qwen/Qwen2-Audio-7B-Instruct`)
-3. `voxtral-small` (`mistralai/Voxtral-Small-24B-2507`)
-
-Qwen2-Audio and Voxtral are called through an OpenAI-compatible
-`/v1/chat/completions` endpoint with vLLM-style `audio_url` data URLs; set
-their endpoint URLs only after the models are actually exposed on the AI server.
-
-AI-server compose snippets for these temporary comparison endpoints live in
-`deploy/ai-server/docker-compose.audio.yml`. They are profile-gated because the
-single GPU cannot keep the production text vLLM, two WhisperX instances, Qwen2
-Audio and Voxtral loaded at the same time:
-
- Qwen2-Audio endpoint: `http://10.2.3.5:8003`
 - Voxtral endpoint: `http://10.2.3.5:8004`
- Start Qwen only:
-  `docker compose -f docker-compose.yml -f docker-compose.audio.yml --profile qwen-audio up -d qwen-audio`
- Start Voxtral only:
+- Start Voxtral:
  `docker compose -f docker-compose.yml -f docker-compose.audio.yml --profile voxtral-small up -d voxtral-small`

 ## API
@@ -90,7 +76,7 @@ Audio and Voxtral loaded at the same time:
 - `GET /api/v1/providers/status` checks configured AI providers without
  returning secrets.
 - `GET /api/v1/infra/status` returns AI-server sidecar telemetry
-  (GPU, containers, vLLM and WhisperX live metrics) when configured.
+  (GPU, containers and vLLM live metrics) when configured.
 - `GET /healthz` returns process health.
 - `GET /readyz` checks PostgreSQL readiness.
 - Built-in workers expose open Kubernetes endpoints on `WORKER_HTTP_PORT`:
@@ -111,19 +97,11 @@ for Kubernetes probes.
 - `LLM_API_KEY`, primary LLM API key
 - `LLM_MODEL`, default `qwen2.5-14b`
 - `LLM_TIMEOUT`, default `5m`
- `TRANSCRIPTION_PROVIDERS`, default `whisperx`, comma-separated ordered list:
-  `whisperx,qwen2-audio,voxtral-small`
- `WHISPERX_URL`, WhisperX endpoint for transcription jobs
- `QWEN_AUDIO_BASE_URL`, OpenAI-compatible endpoint for Qwen2-Audio
- `QWEN_AUDIO_MODEL`, default `Qwen/Qwen2-Audio-7B-Instruct`
- `QWEN_AUDIO_API_KEY`, optional bearer token for Qwen2-Audio; falls back to
-  `AUDIO_LLM_API_KEY`, then `LLM_API_KEY`
 - `VOXTRAL_BASE_URL`, OpenAI-compatible endpoint for Voxtral
 - `VOXTRAL_MODEL`, default `mistralai/Voxtral-Small-24B-2507`
 - `VOXTRAL_API_KEY`, optional bearer token for Voxtral; falls back to
  `AUDIO_LLM_API_KEY`, then `LLM_API_KEY`
- `AUDIO_LLM_PROMPT`, transcription instruction for audio LLM providers
- `AUDIO_LLM_MAX_TOKENS`, default `4096`
+- `AUDIO_LLM_PROMPT`, transcription instruction for Voxtral
 - `WORKER_ID`, default hostname
 - `WORKER_HTTP_HOST`, default `0.0.0.0`
 - `WORKER_HTTP_PORT`, default `8081`