Add transcription provider comparison chain

2026-06-09 12:34:08 +03:00
parent 562fad6f87
commit aaecbb1bed
9 changed files with 600 additions and 57 deletions
--- a/README.md
+++ b/README.md
@@ -46,6 +46,22 @@ or compact `system` / `user` fields. The completed job result contains
 domain metadata fields in `input`, but the worker only reads chat fields such as
 `system`, `user`, `messages`, `max_tokens` and `response_format`.

+`transcription` jobs can run several transcription providers in order for
+temporary A/B comparison. The main `segments` field remains compatible with
+telephony and contains the first successful provider result. The full comparison
+is stored in `attempts` with `provider`, `model`, `status`, `text`, `segments`,
+`duration_ms` and `error`.
+
+Recommended comparison order:
+
+1. `whisperx`
+2. `qwen2-audio` (`Qwen/Qwen2-Audio-7B-Instruct`)
+3. `voxtral-small` (`mistralai/Voxtral-Small-24B-2507`)
+
+Qwen2-Audio and Voxtral are called through an OpenAI-compatible
+`/v1/chat/completions` endpoint with `input_audio`; set their endpoint URLs only
+after the models are actually exposed on the AI server.
+
 ## API

 - `POST /api/v1/jobs` creates one job.
@@ -83,7 +99,17 @@ for Kubernetes probes.
 - `LLM_API_KEY`, primary LLM API key
 - `LLM_MODEL`, default `qwen2.5-14b`
 - `LLM_TIMEOUT`, default `5m`
+- `TRANSCRIPTION_PROVIDERS`, default `whisperx`, comma-separated ordered list:
+  `whisperx,qwen2-audio,voxtral-small`
 - `WHISPERX_URL`, WhisperX endpoint for transcription jobs
+- `QWEN_AUDIO_BASE_URL`, OpenAI-compatible endpoint for Qwen2-Audio
+- `QWEN_AUDIO_MODEL`, default `Qwen/Qwen2-Audio-7B-Instruct`
+- `QWEN_AUDIO_API_KEY`, optional bearer token for Qwen2-Audio
+- `VOXTRAL_BASE_URL`, OpenAI-compatible endpoint for Voxtral
+- `VOXTRAL_MODEL`, default `mistralai/Voxtral-Small-24B-2507`
+- `VOXTRAL_API_KEY`, optional bearer token for Voxtral
+- `AUDIO_LLM_PROMPT`, transcription instruction for audio LLM providers
+- `AUDIO_LLM_MAX_TOKENS`, default `4096`
 - `WORKER_ID`, default hostname
 - `WORKER_HTTP_HOST`, default `0.0.0.0`
 - `WORKER_HTTP_PORT`, default `8081`