Switch transcription to Whisper large v3
This commit is contained in:
29
README.md
29
README.md
@@ -15,7 +15,7 @@ The service is intentionally domain-agnostic:
|
||||
`beeline/{call_id}` or `channel/{message_id}`.
|
||||
- `task_type` describes the technical task class, for example
|
||||
`transcribe`, `call_analysis`, `tg_analysis`, `pf_competitor_analysis`.
|
||||
- `model_profile` selects a runtime profile, for example `voxtral-small`,
|
||||
- `model_profile` selects a runtime profile, for example `whisper-large-v3`,
|
||||
`qwen2.5-14b`, `vision`, or a future provider profile.
|
||||
- `input` and `result` are JSON payloads owned by the caller and worker.
|
||||
|
||||
@@ -46,23 +46,22 @@ or compact `system` / `user` fields. The completed job result contains
|
||||
domain metadata fields in `input`, but the worker only reads chat fields such as
|
||||
`system`, `user`, `messages`, `max_tokens` and `response_format`.
|
||||
|
||||
`transcription` jobs are processed only by Voxtral Small
|
||||
(`mistralai/Voxtral-Small-24B-2507`) through an OpenAI-compatible
|
||||
`transcription` jobs are processed only by Whisper Large v3
|
||||
(`openai/whisper-large-v3`) through an OpenAI-compatible
|
||||
`/v1/audio/transcriptions` endpoint. The returned `segments` field stays
|
||||
compatible with telephony. If the provider returns one long segment, AI Service
|
||||
splits it into smaller transcript segments and adds heuristic speaker labels
|
||||
when diarization is requested.
|
||||
splits it into smaller transcript segments without inventing speaker labels.
|
||||
|
||||
AI-server compose snippet for Voxtral lives in
|
||||
AI-server compose snippet for Whisper Large v3 lives in
|
||||
`deploy/ai-server/docker-compose.audio.yml`:
|
||||
|
||||
- Voxtral endpoint: `http://10.2.3.5:8004`
|
||||
- Start Voxtral:
|
||||
`docker compose -f docker-compose.yml -f docker-compose.audio.yml --profile voxtral-small up -d voxtral-small`
|
||||
- Whisper endpoint: `http://10.2.3.5:8004`
|
||||
- Start Whisper:
|
||||
`docker compose -f docker-compose.yml -f docker-compose.audio.yml --profile whisper-large-v3 up -d whisper-large-v3`
|
||||
|
||||
In Kubernetes the dedicated transcription worker may claim more than one
|
||||
`voxtral-small` job at a time. This keeps download/upload/wait overhead from
|
||||
serializing the queue while Voxtral/vLLM still controls the actual GPU
|
||||
`whisper-large-v3` job at a time. This keeps download/upload/wait overhead from
|
||||
serializing the queue while Whisper/vLLM still controls the actual GPU
|
||||
scheduling.
|
||||
|
||||
## API
|
||||
@@ -102,11 +101,11 @@ for Kubernetes probes.
|
||||
- `LLM_API_KEY`, primary LLM API key
|
||||
- `LLM_MODEL`, default `qwen2.5-14b`
|
||||
- `LLM_TIMEOUT`, default `5m`
|
||||
- `VOXTRAL_BASE_URL`, OpenAI-compatible endpoint for Voxtral
|
||||
- `VOXTRAL_MODEL`, default `mistralai/Voxtral-Small-24B-2507`
|
||||
- `VOXTRAL_API_KEY`, optional bearer token for Voxtral; falls back to
|
||||
- `AUDIO_TRANSCRIPTION_BASE_URL`, OpenAI-compatible transcription endpoint
|
||||
- `AUDIO_TRANSCRIPTION_MODEL`, default `openai/whisper-large-v3`
|
||||
- `AUDIO_TRANSCRIPTION_API_KEY`, optional bearer token; falls back to
|
||||
`AUDIO_LLM_API_KEY`, then `LLM_API_KEY`
|
||||
- `AUDIO_LLM_PROMPT`, transcription instruction for Voxtral
|
||||
- `AUDIO_TRANSCRIPTION_PROMPT`, transcription instruction
|
||||
- `WORKER_ID`, default hostname
|
||||
- `WORKER_HTTP_HOST`, default `0.0.0.0`
|
||||
- `WORKER_HTTP_PORT`, default `8081`
|
||||
|
||||
Reference in New Issue
Block a user