The first version owns only AI job lifecycle and metrics. Business data stays in domain services such as telephony, monitoring-tg and monitoring-pf.

Generic job contract

The service is intentionally domain-agnostic:

owner_service names the caller, for example telephony, monitoring-tg, monitoring-pf or a future Portal module.
owner_ref is the caller's stable object reference, for example beeline/{call_id} or channel/{message_id}.
task_type describes the technical task class, for example transcribe, call_analysis, tg_analysis, pf_competitor_analysis.
model_profile selects a runtime profile, for example whisperx, qwen2.5-14b, vision, or a future provider profile.
input and result are JSON payloads owned by the caller and worker.

This keeps AI service as shared infrastructure rather than a telephony-specific service.

API

POST /api/v1/jobs creates one job.
POST /api/v1/jobs/batch creates many jobs with shared defaults.
POST /api/v1/jobs/claim atomically claims pending jobs for a worker.
GET /api/v1/jobs/{id} returns technical job state and result.
POST /api/v1/jobs/{id}/complete stores a successful job result.
POST /api/v1/jobs/{id}/fail stores a failed job category and message.
POST /api/v1/jobs/{id}/retry resets failed/running jobs to pending.
GET /api/v1/stats returns queue and error counters.
GET /api/v1/providers/status checks configured AI providers without returning secrets.
GET /healthz returns process health.
GET /readyz checks PostgreSQL readiness.

Configuration

HTTP_HOST, default 0.0.0.0
HTTP_PORT, default 8080
DATABASE_URL, required
MIGRATE_ON_START, default true
LLM_BASE_URL, primary OpenAI-compatible LLM endpoint
LLM_API_KEY, primary LLM API key
LLM_MODEL, default qwen2.5-14b
LLM_TIMEOUT, default 5m
WHISPERX_URL, WhisperX endpoint for transcription jobs
OPENCLAW_URL, optional OpenClaw gateway URL if we route through OpenClaw instead of direct vLLM

Next integration step

telephony should first mirror low-risk analysis jobs into this service while continuing local processing. Remote execution can then be enabled by feature flag per task type.

OpenClaw note

Current Portal services call the local AI server directly: vLLM for LLM tasks and WhisperX for transcription. OpenClaw is not required for the current ai-service queue deployment. It becomes useful if we want centralized model routing, provider fallback, request policy and cross-model gateway behavior.