Switch transcription to Whisper large v3

2026-06-10 10:10:13 +03:00
parent 1b63dcdbf5
commit 8d6cd84403
12 changed files with 85 additions and 93 deletions
--- a/README.md
+++ b/README.md
@@ -15,7 +15,7 @@ The service is intentionally domain-agnostic:
  `beeline/{call_id}` or `channel/{message_id}`.
 - `task_type` describes the technical task class, for example
  `transcribe`, `call_analysis`, `tg_analysis`, `pf_competitor_analysis`.
- `model_profile` selects a runtime profile, for example `voxtral-small`,
+- `model_profile` selects a runtime profile, for example `whisper-large-v3`,
  `qwen2.5-14b`, `vision`, or a future provider profile.
 - `input` and `result` are JSON payloads owned by the caller and worker.

@@ -46,23 +46,22 @@ or compact `system` / `user` fields. The completed job result contains
 domain metadata fields in `input`, but the worker only reads chat fields such as
 `system`, `user`, `messages`, `max_tokens` and `response_format`.

-`transcription` jobs are processed only by Voxtral Small
-(`mistralai/Voxtral-Small-24B-2507`) through an OpenAI-compatible
+`transcription` jobs are processed only by Whisper Large v3
+(`openai/whisper-large-v3`) through an OpenAI-compatible
 `/v1/audio/transcriptions` endpoint. The returned `segments` field stays
 compatible with telephony. If the provider returns one long segment, AI Service
-splits it into smaller transcript segments and adds heuristic speaker labels
-when diarization is requested.
+splits it into smaller transcript segments without inventing speaker labels.

-AI-server compose snippet for Voxtral lives in
+AI-server compose snippet for Whisper Large v3 lives in
 `deploy/ai-server/docker-compose.audio.yml`:

- Voxtral endpoint: `http://10.2.3.5:8004`
- Start Voxtral:
-  `docker compose -f docker-compose.yml -f docker-compose.audio.yml --profile voxtral-small up -d voxtral-small`
+- Whisper endpoint: `http://10.2.3.5:8004`
+- Start Whisper:
+  `docker compose -f docker-compose.yml -f docker-compose.audio.yml --profile whisper-large-v3 up -d whisper-large-v3`

 In Kubernetes the dedicated transcription worker may claim more than one
-`voxtral-small` job at a time. This keeps download/upload/wait overhead from
-serializing the queue while Voxtral/vLLM still controls the actual GPU
+`whisper-large-v3` job at a time. This keeps download/upload/wait overhead from
+serializing the queue while Whisper/vLLM still controls the actual GPU
 scheduling.

 ## API
@@ -102,11 +101,11 @@ for Kubernetes probes.
 - `LLM_API_KEY`, primary LLM API key
 - `LLM_MODEL`, default `qwen2.5-14b`
 - `LLM_TIMEOUT`, default `5m`
- `VOXTRAL_BASE_URL`, OpenAI-compatible endpoint for Voxtral
- `VOXTRAL_MODEL`, default `mistralai/Voxtral-Small-24B-2507`
- `VOXTRAL_API_KEY`, optional bearer token for Voxtral; falls back to
+- `AUDIO_TRANSCRIPTION_BASE_URL`, OpenAI-compatible transcription endpoint
+- `AUDIO_TRANSCRIPTION_MODEL`, default `openai/whisper-large-v3`
+- `AUDIO_TRANSCRIPTION_API_KEY`, optional bearer token; falls back to
  `AUDIO_LLM_API_KEY`, then `LLM_API_KEY`
- `AUDIO_LLM_PROMPT`, transcription instruction for Voxtral
+- `AUDIO_TRANSCRIPTION_PROMPT`, transcription instruction
 - `WORKER_ID`, default hostname
 - `WORKER_HTTP_HOST`, default `0.0.0.0`
 - `WORKER_HTTP_PORT`, default `8081`