Refresh AI service pipeline docs
Some checks failed
CI / test (push) Failing after 8s
Build and Deploy / build-and-deploy (push) Successful in 27s

This commit is contained in:
Grendgi
2026-06-10 13:43:59 +03:00
parent 7d0e27f681
commit 80fa21ff80

View File

@@ -2,8 +2,9 @@
Technical AI job service for Portal workloads. Technical AI job service for Portal workloads.
The first version owns only AI job lifecycle and metrics. Business data stays in AI Service owns technical AI job lifecycle, provider execution and metrics.
domain services such as `telephony`, `monitoring-tg` and `monitoring-pf`. Business data stays in domain services such as `telephony`, `monitoring-tg` and
`monitoring-pf`.
## Generic job contract ## Generic job contract
@@ -14,7 +15,8 @@ The service is intentionally domain-agnostic:
- `owner_ref` is the caller's stable object reference, for example - `owner_ref` is the caller's stable object reference, for example
`beeline/{call_id}` or `channel/{message_id}`. `beeline/{call_id}` or `channel/{message_id}`.
- `task_type` describes the technical task class, for example - `task_type` describes the technical task class, for example
`transcribe`, `call_analysis`, `tg_analysis`, `pf_competitor_analysis`. `transcription`, `transcript_summary`, `call_analysis`,
`telegram_classification`, `tg_analysis`, `pf_competitor_analysis`.
- `model_profile` selects a runtime profile, for example `whisper-large-v3`, - `model_profile` selects a runtime profile, for example `whisper-large-v3`,
`qwen2.5-14b`, `vision`, or a future provider profile. `qwen2.5-14b`, `vision`, or a future provider profile.
- `input` and `result` are JSON payloads owned by the caller and worker. - `input` and `result` are JSON payloads owned by the caller and worker.
@@ -24,8 +26,9 @@ service.
## Built-in workers ## Built-in workers
The first built-in worker processes `llm_chat`, `chat_completion` and The LLM worker processes `llm_chat`, `chat_completion`, `call_analysis`,
`call_analysis` jobs whose `model_profile` equals `LLM_MODEL`. `transcript_summary` and `telegram_classification` jobs whose `model_profile`
equals `LLM_MODEL`.
Input can be either explicit messages: Input can be either explicit messages:
@@ -42,9 +45,10 @@ Input can be either explicit messages:
or compact `system` / `user` fields. The completed job result contains or compact `system` / `user` fields. The completed job result contains
`content`, `model`, `usage` and `duration_ms`. `content`, `model`, `usage` and `duration_ms`.
`call_analysis` uses the same input contract as `llm_chat`; callers may include `call_analysis` and `transcript_summary` use the same input contract as
domain metadata fields in `input`, but the worker only reads chat fields such as `llm_chat`; callers may include domain metadata fields in `input`, but the
`system`, `user`, `messages`, `max_tokens` and `response_format`. worker only reads chat fields such as `system`, `user`, `messages`,
`max_tokens` and `response_format`.
`transcription` jobs are processed only by Whisper Large v3 `transcription` jobs are processed only by Whisper Large v3
(`openai/whisper-large-v3`) through an OpenAI-compatible (`openai/whisper-large-v3`) through an OpenAI-compatible
@@ -61,7 +65,7 @@ AI-server compose snippet for Whisper Large v3 lives in
In Kubernetes the dedicated transcription worker may claim more than one In Kubernetes the dedicated transcription worker may claim more than one
`whisper-large-v3` job at a time. This keeps download/upload/wait overhead from `whisper-large-v3` job at a time. This keeps download/upload/wait overhead from
serializing the queue while Whisper/vLLM still controls the actual GPU serializing the queue while the Whisper provider still controls the actual GPU
scheduling. scheduling.
## API ## API
@@ -113,8 +117,10 @@ for Kubernetes probes.
- `WORKER_CLAIM_LIMIT`, default `4` - `WORKER_CLAIM_LIMIT`, default `4`
- `WORKER_LEASE_TIMEOUT`, default `15m` - `WORKER_LEASE_TIMEOUT`, default `15m`
## Next integration step ## Current telephony pipeline
`telephony` should first mirror low-risk analysis jobs into this service while `telephony` now uses AI Service as the only AI execution path:
continuing local processing. Remote execution can then be enabled by feature
flag per task type. 1. `transcription` turns call audio into segments.
2. `transcript_summary` creates a detailed Russian call summary.
3. `call_analysis` runs tags and negotiation rules against the summary.