AI Service
Technical AI job service for Portal workloads.
The first version owns only AI job lifecycle and metrics. Business data stays in
domain services such as telephony, monitoring-tg and monitoring-pf.
Generic job contract
The service is intentionally domain-agnostic:
owner_servicenames the caller, for exampletelephony,monitoring-tg,monitoring-pfor a future Portal module.owner_refis the caller's stable object reference, for examplebeeline/{call_id}orchannel/{message_id}.task_typedescribes the technical task class, for exampletranscribe,call_analysis,tg_analysis,pf_competitor_analysis.model_profileselects a runtime profile, for examplevoxtral-small,qwen2.5-14b,vision, or a future provider profile.inputandresultare JSON payloads owned by the caller and worker.
This keeps AI service as shared infrastructure rather than a telephony-specific service.
Built-in workers
The first built-in worker processes llm_chat, chat_completion and
call_analysis jobs whose model_profile equals LLM_MODEL.
Input can be either explicit messages:
{
"messages": [
{"role": "system", "content": "Answer as JSON."},
{"role": "user", "content": "Classify this text"}
],
"max_tokens": 256
}
or compact system / user fields. The completed job result contains
content, model, usage and duration_ms.
call_analysis uses the same input contract as llm_chat; callers may include
domain metadata fields in input, but the worker only reads chat fields such as
system, user, messages, max_tokens and response_format.
transcription jobs are processed only by Voxtral Small
(mistralai/Voxtral-Small-24B-2507) through an OpenAI-compatible
/v1/audio/transcriptions endpoint. The returned segments field stays
compatible with telephony. If the provider returns one long segment, AI Service
splits it into smaller transcript segments and adds heuristic speaker labels
when diarization is requested.
AI-server compose snippet for Voxtral lives in
deploy/ai-server/docker-compose.audio.yml:
- Voxtral endpoint:
http://10.2.3.5:8004 - Start Voxtral:
docker compose -f docker-compose.yml -f docker-compose.audio.yml --profile voxtral-small up -d voxtral-small
API
POST /api/v1/jobscreates one job.GET /api/v1/jobslists jobs with query filters.POST /api/v1/jobs/batchcreates many jobs with shared defaults.POST /api/v1/jobs/retryretries failed/running jobs by filter.POST /api/v1/jobs/cancelcancels pending/running jobs by filter.POST /api/v1/jobs/claimatomically claims pending jobs for a worker.GET /api/v1/jobs/{id}returns technical job state and result.POST /api/v1/jobs/{id}/completestores a successful job result.POST /api/v1/jobs/{id}/failstores a failed job category and message.POST /api/v1/jobs/{id}/retryresets failed/running jobs topending.GET /api/v1/statsreturns queue and error counters.GET /api/v1/providers/statuschecks configured AI providers without returning secrets.GET /api/v1/infra/statusreturns AI-server sidecar telemetry (GPU, containers and vLLM live metrics) when configured.GET /healthzreturns process health.GET /readyzchecks PostgreSQL readiness.- Built-in workers expose open Kubernetes endpoints on
WORKER_HTTP_PORT:GET /healthz,GET /readyzandGET /worker/status.
All /api/v1/* endpoints require Authorization: Bearer <AI_SERVICE_TOKEN>
when AI_SERVICE_TOKEN is configured. Health and readiness endpoints stay open
for Kubernetes probes.
Configuration
HTTP_HOST, default0.0.0.0HTTP_PORT, default8080DATABASE_URL, requiredMIGRATE_ON_START, defaulttrueAI_SERVICE_TOKEN, optional bearer token for service-to-service API callsLLM_BASE_URL, primary OpenAI-compatible LLM endpointLLM_API_KEY, primary LLM API keyLLM_MODEL, defaultqwen2.5-14bLLM_TIMEOUT, default5mVOXTRAL_BASE_URL, OpenAI-compatible endpoint for VoxtralVOXTRAL_MODEL, defaultmistralai/Voxtral-Small-24B-2507VOXTRAL_API_KEY, optional bearer token for Voxtral; falls back toAUDIO_LLM_API_KEY, thenLLM_API_KEYAUDIO_LLM_PROMPT, transcription instruction for VoxtralWORKER_ID, default hostnameWORKER_HTTP_HOST, default0.0.0.0WORKER_HTTP_PORT, default8081WORKER_POLL_INTERVAL, default2sWORKER_CLAIM_LIMIT, default4WORKER_LEASE_TIMEOUT, default15m
Next integration step
telephony should first mirror low-risk analysis jobs into this service while
continuing local processing. Remote execution can then be enabled by feature
flag per task type.