3.8 KiB
AI Service
Technical AI job service for Portal workloads.
The first version owns only AI job lifecycle and metrics. Business data stays in
domain services such as telephony, monitoring-tg and monitoring-pf.
Generic job contract
The service is intentionally domain-agnostic:
owner_servicenames the caller, for exampletelephony,monitoring-tg,monitoring-pfor a future Portal module.owner_refis the caller's stable object reference, for examplebeeline/{call_id}orchannel/{message_id}.task_typedescribes the technical task class, for exampletranscribe,call_analysis,tg_analysis,pf_competitor_analysis.model_profileselects a runtime profile, for examplewhisperx,qwen2.5-14b,vision, or a future provider profile.inputandresultare JSON payloads owned by the caller and worker.
This keeps AI service as shared infrastructure rather than a telephony-specific service.
Built-in workers
The first built-in worker processes llm_chat, chat_completion and
call_analysis jobs whose model_profile equals LLM_MODEL.
Input can be either explicit messages:
{
"messages": [
{"role": "system", "content": "Answer as JSON."},
{"role": "user", "content": "Classify this text"}
],
"max_tokens": 256
}
or compact system / user fields. The completed job result contains
content, model, usage and duration_ms.
call_analysis uses the same input contract as llm_chat; callers may include
domain metadata fields in input, but the worker only reads chat fields such as
system, user, messages, max_tokens and response_format.
API
POST /api/v1/jobscreates one job.GET /api/v1/jobslists jobs with query filters.POST /api/v1/jobs/batchcreates many jobs with shared defaults.POST /api/v1/jobs/retryretries failed/running jobs by filter.POST /api/v1/jobs/cancelcancels pending/running jobs by filter.POST /api/v1/jobs/claimatomically claims pending jobs for a worker.GET /api/v1/jobs/{id}returns technical job state and result.POST /api/v1/jobs/{id}/completestores a successful job result.POST /api/v1/jobs/{id}/failstores a failed job category and message.POST /api/v1/jobs/{id}/retryresets failed/running jobs topending.GET /api/v1/statsreturns queue and error counters.GET /api/v1/providers/statuschecks configured AI providers without returning secrets.GET /api/v1/infra/statusreturns AI-server sidecar telemetry (GPU, containers, vLLM and WhisperX live metrics) when configured.GET /healthzreturns process health.GET /readyzchecks PostgreSQL readiness.- Built-in workers expose open Kubernetes endpoints on
WORKER_HTTP_PORT:GET /healthz,GET /readyzandGET /worker/status.
All /api/v1/* endpoints require Authorization: Bearer <AI_SERVICE_TOKEN>
when AI_SERVICE_TOKEN is configured. Health and readiness endpoints stay open
for Kubernetes probes.
Configuration
HTTP_HOST, default0.0.0.0HTTP_PORT, default8080DATABASE_URL, requiredMIGRATE_ON_START, defaulttrueAI_SERVICE_TOKEN, optional bearer token for service-to-service API callsLLM_BASE_URL, primary OpenAI-compatible LLM endpointLLM_API_KEY, primary LLM API keyLLM_MODEL, defaultqwen2.5-14bLLM_TIMEOUT, default5mWHISPERX_URL, WhisperX endpoint for transcription jobsWORKER_ID, default hostnameWORKER_HTTP_HOST, default0.0.0.0WORKER_HTTP_PORT, default8081WORKER_POLL_INTERVAL, default2sWORKER_CLAIM_LIMIT, default4WORKER_LEASE_TIMEOUT, default15m
Next integration step
telephony should first mirror low-risk analysis jobs into this service while
continuing local processing. Remote execution can then be enabled by feature
flag per task type.