Refresh AI service pipeline docs

2026-06-10 13:43:59 +03:00
parent 7d0e27f681
commit 80fa21ff80
1 changed files with 19 additions and 13 deletions
--- a/README.md
+++ b/README.md
@@ -2,8 +2,9 @@

 Technical AI job service for Portal workloads.

-The first version owns only AI job lifecycle and metrics. Business data stays in
-domain services such as `telephony`, `monitoring-tg` and `monitoring-pf`.
+AI Service owns technical AI job lifecycle, provider execution and metrics.
+Business data stays in domain services such as `telephony`, `monitoring-tg` and
+`monitoring-pf`.

 ## Generic job contract

@@ -14,7 +15,8 @@ The service is intentionally domain-agnostic:
 - `owner_ref` is the caller's stable object reference, for example
  `beeline/{call_id}` or `channel/{message_id}`.
 - `task_type` describes the technical task class, for example
-  `transcribe`, `call_analysis`, `tg_analysis`, `pf_competitor_analysis`.
+  `transcription`, `transcript_summary`, `call_analysis`,
+  `telegram_classification`, `tg_analysis`, `pf_competitor_analysis`.
 - `model_profile` selects a runtime profile, for example `whisper-large-v3`,
  `qwen2.5-14b`, `vision`, or a future provider profile.
 - `input` and `result` are JSON payloads owned by the caller and worker.
@@ -24,8 +26,9 @@ service.

 ## Built-in workers

-The first built-in worker processes `llm_chat`, `chat_completion` and
-`call_analysis` jobs whose `model_profile` equals `LLM_MODEL`.
+The LLM worker processes `llm_chat`, `chat_completion`, `call_analysis`,
+`transcript_summary` and `telegram_classification` jobs whose `model_profile`
+equals `LLM_MODEL`.

 Input can be either explicit messages:

@@ -42,9 +45,10 @@ Input can be either explicit messages:
 or compact `system` / `user` fields. The completed job result contains
 `content`, `model`, `usage` and `duration_ms`.

-`call_analysis` uses the same input contract as `llm_chat`; callers may include
-domain metadata fields in `input`, but the worker only reads chat fields such as
-`system`, `user`, `messages`, `max_tokens` and `response_format`.
+`call_analysis` and `transcript_summary` use the same input contract as
+`llm_chat`; callers may include domain metadata fields in `input`, but the
+worker only reads chat fields such as `system`, `user`, `messages`,
+`max_tokens` and `response_format`.

 `transcription` jobs are processed only by Whisper Large v3
 (`openai/whisper-large-v3`) through an OpenAI-compatible
@@ -61,7 +65,7 @@ AI-server compose snippet for Whisper Large v3 lives in

 In Kubernetes the dedicated transcription worker may claim more than one
 `whisper-large-v3` job at a time. This keeps download/upload/wait overhead from
-serializing the queue while Whisper/vLLM still controls the actual GPU
+serializing the queue while the Whisper provider still controls the actual GPU
 scheduling.

 ## API
@@ -113,8 +117,10 @@ for Kubernetes probes.
 - `WORKER_CLAIM_LIMIT`, default `4`
 - `WORKER_LEASE_TIMEOUT`, default `15m`

-## Next integration step
+## Current telephony pipeline

-`telephony` should first mirror low-risk analysis jobs into this service while
-continuing local processing. Remote execution can then be enabled by feature
-flag per task type.
+`telephony` now uses AI Service as the only AI execution path:
+
+1. `transcription` turns call audio into segments.
+2. `transcript_summary` creates a detailed Russian call summary.
+3. `call_analysis` runs tags and negotiation rules against the summary.