From 80fa21ff803ca3ad0ec50dadbe93a3f2a39e3a7b Mon Sep 17 00:00:00 2001 From: Grendgi Date: Wed, 10 Jun 2026 13:43:59 +0300 Subject: [PATCH] Refresh AI service pipeline docs --- README.md | 32 +++++++++++++++++++------------- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/README.md b/README.md index 21a516b..c310407 100644 --- a/README.md +++ b/README.md @@ -2,8 +2,9 @@ Technical AI job service for Portal workloads. -The first version owns only AI job lifecycle and metrics. Business data stays in -domain services such as `telephony`, `monitoring-tg` and `monitoring-pf`. +AI Service owns technical AI job lifecycle, provider execution and metrics. +Business data stays in domain services such as `telephony`, `monitoring-tg` and +`monitoring-pf`. ## Generic job contract @@ -14,7 +15,8 @@ The service is intentionally domain-agnostic: - `owner_ref` is the caller's stable object reference, for example `beeline/{call_id}` or `channel/{message_id}`. - `task_type` describes the technical task class, for example - `transcribe`, `call_analysis`, `tg_analysis`, `pf_competitor_analysis`. + `transcription`, `transcript_summary`, `call_analysis`, + `telegram_classification`, `tg_analysis`, `pf_competitor_analysis`. - `model_profile` selects a runtime profile, for example `whisper-large-v3`, `qwen2.5-14b`, `vision`, or a future provider profile. - `input` and `result` are JSON payloads owned by the caller and worker. @@ -24,8 +26,9 @@ service. ## Built-in workers -The first built-in worker processes `llm_chat`, `chat_completion` and -`call_analysis` jobs whose `model_profile` equals `LLM_MODEL`. +The LLM worker processes `llm_chat`, `chat_completion`, `call_analysis`, +`transcript_summary` and `telegram_classification` jobs whose `model_profile` +equals `LLM_MODEL`. Input can be either explicit messages: @@ -42,9 +45,10 @@ Input can be either explicit messages: or compact `system` / `user` fields. The completed job result contains `content`, `model`, `usage` and `duration_ms`. -`call_analysis` uses the same input contract as `llm_chat`; callers may include -domain metadata fields in `input`, but the worker only reads chat fields such as -`system`, `user`, `messages`, `max_tokens` and `response_format`. +`call_analysis` and `transcript_summary` use the same input contract as +`llm_chat`; callers may include domain metadata fields in `input`, but the +worker only reads chat fields such as `system`, `user`, `messages`, +`max_tokens` and `response_format`. `transcription` jobs are processed only by Whisper Large v3 (`openai/whisper-large-v3`) through an OpenAI-compatible @@ -61,7 +65,7 @@ AI-server compose snippet for Whisper Large v3 lives in In Kubernetes the dedicated transcription worker may claim more than one `whisper-large-v3` job at a time. This keeps download/upload/wait overhead from -serializing the queue while Whisper/vLLM still controls the actual GPU +serializing the queue while the Whisper provider still controls the actual GPU scheduling. ## API @@ -113,8 +117,10 @@ for Kubernetes probes. - `WORKER_CLAIM_LIMIT`, default `4` - `WORKER_LEASE_TIMEOUT`, default `15m` -## Next integration step +## Current telephony pipeline -`telephony` should first mirror low-risk analysis jobs into this service while -continuing local processing. Remote execution can then be enabled by feature -flag per task type. +`telephony` now uses AI Service as the only AI execution path: + +1. `transcription` turns call audio into segments. +2. `transcript_summary` creates a detailed Russian call summary. +3. `call_analysis` runs tags and negotiation rules against the summary.