Refresh AI service pipeline docs
This commit is contained in:
32
README.md
32
README.md
@@ -2,8 +2,9 @@
|
|||||||
|
|
||||||
Technical AI job service for Portal workloads.
|
Technical AI job service for Portal workloads.
|
||||||
|
|
||||||
The first version owns only AI job lifecycle and metrics. Business data stays in
|
AI Service owns technical AI job lifecycle, provider execution and metrics.
|
||||||
domain services such as `telephony`, `monitoring-tg` and `monitoring-pf`.
|
Business data stays in domain services such as `telephony`, `monitoring-tg` and
|
||||||
|
`monitoring-pf`.
|
||||||
|
|
||||||
## Generic job contract
|
## Generic job contract
|
||||||
|
|
||||||
@@ -14,7 +15,8 @@ The service is intentionally domain-agnostic:
|
|||||||
- `owner_ref` is the caller's stable object reference, for example
|
- `owner_ref` is the caller's stable object reference, for example
|
||||||
`beeline/{call_id}` or `channel/{message_id}`.
|
`beeline/{call_id}` or `channel/{message_id}`.
|
||||||
- `task_type` describes the technical task class, for example
|
- `task_type` describes the technical task class, for example
|
||||||
`transcribe`, `call_analysis`, `tg_analysis`, `pf_competitor_analysis`.
|
`transcription`, `transcript_summary`, `call_analysis`,
|
||||||
|
`telegram_classification`, `tg_analysis`, `pf_competitor_analysis`.
|
||||||
- `model_profile` selects a runtime profile, for example `whisper-large-v3`,
|
- `model_profile` selects a runtime profile, for example `whisper-large-v3`,
|
||||||
`qwen2.5-14b`, `vision`, or a future provider profile.
|
`qwen2.5-14b`, `vision`, or a future provider profile.
|
||||||
- `input` and `result` are JSON payloads owned by the caller and worker.
|
- `input` and `result` are JSON payloads owned by the caller and worker.
|
||||||
@@ -24,8 +26,9 @@ service.
|
|||||||
|
|
||||||
## Built-in workers
|
## Built-in workers
|
||||||
|
|
||||||
The first built-in worker processes `llm_chat`, `chat_completion` and
|
The LLM worker processes `llm_chat`, `chat_completion`, `call_analysis`,
|
||||||
`call_analysis` jobs whose `model_profile` equals `LLM_MODEL`.
|
`transcript_summary` and `telegram_classification` jobs whose `model_profile`
|
||||||
|
equals `LLM_MODEL`.
|
||||||
|
|
||||||
Input can be either explicit messages:
|
Input can be either explicit messages:
|
||||||
|
|
||||||
@@ -42,9 +45,10 @@ Input can be either explicit messages:
|
|||||||
or compact `system` / `user` fields. The completed job result contains
|
or compact `system` / `user` fields. The completed job result contains
|
||||||
`content`, `model`, `usage` and `duration_ms`.
|
`content`, `model`, `usage` and `duration_ms`.
|
||||||
|
|
||||||
`call_analysis` uses the same input contract as `llm_chat`; callers may include
|
`call_analysis` and `transcript_summary` use the same input contract as
|
||||||
domain metadata fields in `input`, but the worker only reads chat fields such as
|
`llm_chat`; callers may include domain metadata fields in `input`, but the
|
||||||
`system`, `user`, `messages`, `max_tokens` and `response_format`.
|
worker only reads chat fields such as `system`, `user`, `messages`,
|
||||||
|
`max_tokens` and `response_format`.
|
||||||
|
|
||||||
`transcription` jobs are processed only by Whisper Large v3
|
`transcription` jobs are processed only by Whisper Large v3
|
||||||
(`openai/whisper-large-v3`) through an OpenAI-compatible
|
(`openai/whisper-large-v3`) through an OpenAI-compatible
|
||||||
@@ -61,7 +65,7 @@ AI-server compose snippet for Whisper Large v3 lives in
|
|||||||
|
|
||||||
In Kubernetes the dedicated transcription worker may claim more than one
|
In Kubernetes the dedicated transcription worker may claim more than one
|
||||||
`whisper-large-v3` job at a time. This keeps download/upload/wait overhead from
|
`whisper-large-v3` job at a time. This keeps download/upload/wait overhead from
|
||||||
serializing the queue while Whisper/vLLM still controls the actual GPU
|
serializing the queue while the Whisper provider still controls the actual GPU
|
||||||
scheduling.
|
scheduling.
|
||||||
|
|
||||||
## API
|
## API
|
||||||
@@ -113,8 +117,10 @@ for Kubernetes probes.
|
|||||||
- `WORKER_CLAIM_LIMIT`, default `4`
|
- `WORKER_CLAIM_LIMIT`, default `4`
|
||||||
- `WORKER_LEASE_TIMEOUT`, default `15m`
|
- `WORKER_LEASE_TIMEOUT`, default `15m`
|
||||||
|
|
||||||
## Next integration step
|
## Current telephony pipeline
|
||||||
|
|
||||||
`telephony` should first mirror low-risk analysis jobs into this service while
|
`telephony` now uses AI Service as the only AI execution path:
|
||||||
continuing local processing. Remote execution can then be enabled by feature
|
|
||||||
flag per task type.
|
1. `transcription` turns call audio into segments.
|
||||||
|
2. `transcript_summary` creates a detailed Russian call summary.
|
||||||
|
3. `call_analysis` runs tags and negotiation rules against the summary.
|
||||||
|
|||||||
Reference in New Issue
Block a user