4.9 KiB
AI Service
Technical AI job service for Portal workloads.
The first version owns only AI job lifecycle and metrics. Business data stays in
domain services such as telephony, monitoring-tg and monitoring-pf.
Generic job contract
The service is intentionally domain-agnostic:
owner_servicenames the caller, for exampletelephony,monitoring-tg,monitoring-pfor a future Portal module.owner_refis the caller's stable object reference, for examplebeeline/{call_id}orchannel/{message_id}.task_typedescribes the technical task class, for exampletranscribe,call_analysis,tg_analysis,pf_competitor_analysis.model_profileselects a runtime profile, for examplewhisper-large-v3,qwen2.5-14b,vision, or a future provider profile.inputandresultare JSON payloads owned by the caller and worker.
This keeps AI service as shared infrastructure rather than a telephony-specific service.
Built-in workers
The first built-in worker processes llm_chat, chat_completion and
call_analysis jobs whose model_profile equals LLM_MODEL.
Input can be either explicit messages:
{
"messages": [
{"role": "system", "content": "Answer as JSON."},
{"role": "user", "content": "Classify this text"}
],
"max_tokens": 256
}
or compact system / user fields. The completed job result contains
content, model, usage and duration_ms.
call_analysis uses the same input contract as llm_chat; callers may include
domain metadata fields in input, but the worker only reads chat fields such as
system, user, messages, max_tokens and response_format.
transcription jobs are processed only by Whisper Large v3
(openai/whisper-large-v3) through an OpenAI-compatible
/v1/audio/transcriptions endpoint. The returned segments field stays
compatible with telephony. If the provider returns one long segment, AI Service
splits it into smaller transcript segments without inventing speaker labels.
AI-server compose snippet for Whisper Large v3 lives in
deploy/ai-server/docker-compose.audio.yml:
- Whisper endpoint:
http://10.2.3.5:8004 - Start Whisper:
docker compose -f docker-compose.yml -f docker-compose.audio.yml --profile whisper-large-v3 up -d whisper-large-v3
In Kubernetes the dedicated transcription worker may claim more than one
whisper-large-v3 job at a time. This keeps download/upload/wait overhead from
serializing the queue while Whisper/vLLM still controls the actual GPU
scheduling.
API
POST /api/v1/jobscreates one job.GET /api/v1/jobslists jobs with query filters.POST /api/v1/jobs/batchcreates many jobs with shared defaults.POST /api/v1/jobs/retryretries failed/running jobs by filter.POST /api/v1/jobs/cancelcancels pending/running jobs by filter.POST /api/v1/jobs/claimatomically claims pending jobs for a worker.GET /api/v1/jobs/{id}returns technical job state and result.POST /api/v1/jobs/{id}/completestores a successful job result.POST /api/v1/jobs/{id}/failstores a failed job category and message.POST /api/v1/jobs/{id}/retryresets failed/running jobs topending.GET /api/v1/statsreturns queue and error counters.GET /api/v1/providers/statuschecks configured AI providers without returning secrets.GET /api/v1/infra/statusreturns AI-server sidecar telemetry (GPU, containers and vLLM live metrics) when configured.GET /healthzreturns process health.GET /readyzchecks PostgreSQL readiness.- Built-in workers expose open Kubernetes endpoints on
WORKER_HTTP_PORT:GET /healthz,GET /readyzandGET /worker/status.
All /api/v1/* endpoints require Authorization: Bearer <AI_SERVICE_TOKEN>
when AI_SERVICE_TOKEN is configured. Health and readiness endpoints stay open
for Kubernetes probes.
Configuration
HTTP_HOST, default0.0.0.0HTTP_PORT, default8080DATABASE_URL, requiredMIGRATE_ON_START, defaulttrueAI_SERVICE_TOKEN, optional bearer token for service-to-service API callsLLM_BASE_URL, primary OpenAI-compatible LLM endpointLLM_API_KEY, primary LLM API keyLLM_MODEL, defaultqwen2.5-14bLLM_TIMEOUT, default5mAUDIO_TRANSCRIPTION_BASE_URL, OpenAI-compatible transcription endpointAUDIO_TRANSCRIPTION_MODEL, defaultopenai/whisper-large-v3AUDIO_TRANSCRIPTION_API_KEY, optional bearer token; falls back toAUDIO_LLM_API_KEY, thenLLM_API_KEYAUDIO_TRANSCRIPTION_PROMPT, transcription instructionWORKER_ID, default hostnameWORKER_HTTP_HOST, default0.0.0.0WORKER_HTTP_PORT, default8081WORKER_POLL_INTERVAL, default2sWORKER_CLAIM_LIMIT, default4WORKER_LEASE_TIMEOUT, default15m
Next integration step
telephony should first mirror low-risk analysis jobs into this service while
continuing local processing. Remote execution can then be enabled by feature
flag per task type.