6.6 KiB
AI Service
Technical AI job service for Portal workloads.
AI Service owns technical AI job lifecycle, provider execution and metrics.
Business data stays in domain services such as telephony, monitoring-tg and
monitoring-pf.
Generic job contract
The service is intentionally domain-agnostic:
owner_servicenames the caller, for exampletelephony,monitoring-tg,monitoring-pfor a future Portal module.owner_refis the caller's stable object reference, for examplebeeline/{call_id}orchannel/{message_id}.task_typedescribes the technical task class, for exampletranscription,transcript_summary,call_analysis,telegram_classification,tg_analysis,pf_competitor_analysis.model_profileselects a runtime profile, for examplewhisper-large-v3,qwen2.5-14b,vision, or a future provider profile.inputandresultare JSON payloads owned by the caller and worker.
This keeps AI service as shared infrastructure rather than a telephony-specific service.
Built-in workers
The LLM worker processes llm_chat, chat_completion, call_analysis,
transcript_summary and telegram_classification jobs whose model_profile
equals LLM_MODEL.
Input can be either explicit messages:
{
"messages": [
{"role": "system", "content": "Answer as JSON."},
{"role": "user", "content": "Classify this text"}
],
"max_tokens": 256
}
or compact system / user fields. The completed job result contains
schema_version=ai.chat_result.v1, content, model, usage and
duration_ms.
call_analysis and transcript_summary use the same input contract as
llm_chat; callers may include domain metadata fields in input, but the
worker only reads chat fields such as system, user, messages,
max_tokens and response_format.
transcription jobs are processed only by Whisper Large v3
(openai/whisper-large-v3) through an OpenAI-compatible
/v1/audio/transcriptions endpoint. The returned segments field stays
compatible with telephony. If the provider returns one long segment, AI Service
splits it into smaller transcript segments without inventing speaker labels.
The completed job result contains
schema_version=ai.transcription_result.v1, provider, model, language,
segments, optional provider attempts and duration_ms.
AI-server compose snippet for Whisper Large v3 lives in
deploy/ai-server/docker-compose.audio.yml:
- Whisper endpoint:
http://10.2.3.5:8004 - Start Whisper:
docker compose -f docker-compose.yml -f docker-compose.audio.yml --profile whisper-large-v3 up -d whisper-large-v3
In Kubernetes the dedicated transcription worker may claim more than one
whisper-large-v3 job at a time. This keeps download/upload/wait overhead from
serializing the queue while the Whisper provider still controls the actual GPU
scheduling.
API
POST /api/v1/jobscreates one job.GET /api/v1/jobslists jobs with query filters.POST /api/v1/jobs/batchcreates many jobs with shared defaults.POST /api/v1/jobs/retryretries failed/running jobs by filter.POST /api/v1/jobs/cancelcancels pending/running jobs by filter.POST /api/v1/jobs/claimatomically claims pending jobs for a worker.GET /api/v1/jobs/{id}returns technical job state and result.POST /api/v1/jobs/{id}/completestores a successful job result.POST /api/v1/jobs/{id}/failstores a failed job category and message.POST /api/v1/jobs/{id}/retryresets failed/running jobs topending.GET /api/v1/statsreturns queue and error counters.GET /api/v1/providers/statuschecks configured AI providers without returning secrets.GET /api/v1/infra/statusreturns AI-server sidecar telemetry (GPU, containers and vLLM live metrics) when configured.GET /health/detailreturns PostgreSQL, provider, queue, error, throughput and infra components for Portaladmin/health.GET /healthzreturns process health.GET /readyzchecks PostgreSQL readiness.- Built-in workers expose open Kubernetes endpoints on
WORKER_HTTP_PORT:GET /healthz,GET /readyzandGET /worker/status.
All /api/v1/* endpoints require Authorization: Bearer <AI_SERVICE_TOKEN>
when AI_SERVICE_TOKEN is configured. Health and readiness endpoints stay open
for Kubernetes probes.
Retry policy
Workers store a normalized error_code on failed jobs. AI Service requeues only
explicitly retryable categories while attempts remain.
| Category | Retry | Delay |
|---|---|---|
provider_unavailable, model_unavailable, provider_error, dependency_error, timeout, storage_error, stale_worker |
yes | 30s |
bad_response, transcript_hallucination, transcript_incomplete, internal_error, unknown |
yes | 2m |
bad_audio, bad_input, context_length, unsupported_task, cancelled |
no | - |
Domain services may still expose manual retry for terminal errors after the underlying data or prompt is corrected.
Result schemas
AI Service result payloads are versioned with schema_version. Consumers should
ignore unknown fields and reject only unsupported major schema names.
Current schemas:
ai.chat_result.v1:{schema_version, content, model, usage?, duration_ms}.ai.transcription_result.v1:{schema_version, provider?, model?, attempts?, language, segments, duration_ms}.
New optional fields may be added to a v1 schema without a breaking change.
Breaking shape changes require a new schema name.
Configuration
HTTP_HOST, default0.0.0.0HTTP_PORT, default8080DATABASE_URL, requiredMIGRATE_ON_START, defaulttrueAI_SERVICE_TOKEN, optional bearer token for service-to-service API callsLLM_BASE_URL, primary OpenAI-compatible LLM endpointLLM_API_KEY, primary LLM API keyLLM_MODEL, defaultqwen2.5-14bLLM_TIMEOUT, default5mAUDIO_TRANSCRIPTION_BASE_URL, OpenAI-compatible transcription endpointAUDIO_TRANSCRIPTION_MODEL, defaultopenai/whisper-large-v3AUDIO_TRANSCRIPTION_API_KEY, optional bearer token; falls back toAUDIO_LLM_API_KEY, thenLLM_API_KEYAUDIO_TRANSCRIPTION_PROMPT, transcription instructionWORKER_ID, default hostnameWORKER_HTTP_HOST, default0.0.0.0WORKER_HTTP_PORT, default8081WORKER_POLL_INTERVAL, default2sWORKER_CLAIM_LIMIT, default4WORKER_LEASE_TIMEOUT, default15m
Current telephony pipeline
telephony now uses AI Service as the only AI execution path:
transcriptionturns call audio into segments.transcript_summarycreates a detailed Russian call summary.call_analysisruns tags and negotiation rules against the summary.