Grendgi e6ae792325
Some checks failed
CI / test (push) Failing after 8s
Build and Deploy / build-and-deploy (push) Successful in 29s
Drop legacy audio config aliases
2026-06-10 14:28:52 +03:00
2026-06-10 14:28:52 +03:00
2026-06-10 13:20:06 +03:00
2026-06-08 13:45:55 +03:00
2026-06-08 13:37:06 +03:00
2026-06-08 13:23:10 +03:00
2026-06-08 13:23:10 +03:00
2026-06-10 13:43:59 +03:00

AI Service

Technical AI job service for Portal workloads.

AI Service owns technical AI job lifecycle, provider execution and metrics. Business data stays in domain services such as telephony, monitoring-tg and monitoring-pf.

Generic job contract

The service is intentionally domain-agnostic:

  • owner_service names the caller, for example telephony, monitoring-tg, monitoring-pf or a future Portal module.
  • owner_ref is the caller's stable object reference, for example beeline/{call_id} or channel/{message_id}.
  • task_type describes the technical task class, for example transcription, transcript_summary, call_analysis, telegram_classification, tg_analysis, pf_competitor_analysis.
  • model_profile selects a runtime profile, for example whisper-large-v3, qwen2.5-14b, vision, or a future provider profile.
  • input and result are JSON payloads owned by the caller and worker.

This keeps AI service as shared infrastructure rather than a telephony-specific service.

Built-in workers

The LLM worker processes llm_chat, chat_completion, call_analysis, transcript_summary and telegram_classification jobs whose model_profile equals LLM_MODEL.

Input can be either explicit messages:

{
  "messages": [
    {"role": "system", "content": "Answer as JSON."},
    {"role": "user", "content": "Classify this text"}
  ],
  "max_tokens": 256
}

or compact system / user fields. The completed job result contains content, model, usage and duration_ms.

call_analysis and transcript_summary use the same input contract as llm_chat; callers may include domain metadata fields in input, but the worker only reads chat fields such as system, user, messages, max_tokens and response_format.

transcription jobs are processed only by Whisper Large v3 (openai/whisper-large-v3) through an OpenAI-compatible /v1/audio/transcriptions endpoint. The returned segments field stays compatible with telephony. If the provider returns one long segment, AI Service splits it into smaller transcript segments without inventing speaker labels.

AI-server compose snippet for Whisper Large v3 lives in deploy/ai-server/docker-compose.audio.yml:

  • Whisper endpoint: http://10.2.3.5:8004
  • Start Whisper: docker compose -f docker-compose.yml -f docker-compose.audio.yml --profile whisper-large-v3 up -d whisper-large-v3

In Kubernetes the dedicated transcription worker may claim more than one whisper-large-v3 job at a time. This keeps download/upload/wait overhead from serializing the queue while the Whisper provider still controls the actual GPU scheduling.

API

  • POST /api/v1/jobs creates one job.
  • GET /api/v1/jobs lists jobs with query filters.
  • POST /api/v1/jobs/batch creates many jobs with shared defaults.
  • POST /api/v1/jobs/retry retries failed/running jobs by filter.
  • POST /api/v1/jobs/cancel cancels pending/running jobs by filter.
  • POST /api/v1/jobs/claim atomically claims pending jobs for a worker.
  • GET /api/v1/jobs/{id} returns technical job state and result.
  • POST /api/v1/jobs/{id}/complete stores a successful job result.
  • POST /api/v1/jobs/{id}/fail stores a failed job category and message.
  • POST /api/v1/jobs/{id}/retry resets failed/running jobs to pending.
  • GET /api/v1/stats returns queue and error counters.
  • GET /api/v1/providers/status checks configured AI providers without returning secrets.
  • GET /api/v1/infra/status returns AI-server sidecar telemetry (GPU, containers and vLLM live metrics) when configured.
  • GET /healthz returns process health.
  • GET /readyz checks PostgreSQL readiness.
  • Built-in workers expose open Kubernetes endpoints on WORKER_HTTP_PORT: GET /healthz, GET /readyz and GET /worker/status.

All /api/v1/* endpoints require Authorization: Bearer <AI_SERVICE_TOKEN> when AI_SERVICE_TOKEN is configured. Health and readiness endpoints stay open for Kubernetes probes.

Configuration

  • HTTP_HOST, default 0.0.0.0
  • HTTP_PORT, default 8080
  • DATABASE_URL, required
  • MIGRATE_ON_START, default true
  • AI_SERVICE_TOKEN, optional bearer token for service-to-service API calls
  • LLM_BASE_URL, primary OpenAI-compatible LLM endpoint
  • LLM_API_KEY, primary LLM API key
  • LLM_MODEL, default qwen2.5-14b
  • LLM_TIMEOUT, default 5m
  • AUDIO_TRANSCRIPTION_BASE_URL, OpenAI-compatible transcription endpoint
  • AUDIO_TRANSCRIPTION_MODEL, default openai/whisper-large-v3
  • AUDIO_TRANSCRIPTION_API_KEY, optional bearer token; falls back to AUDIO_LLM_API_KEY, then LLM_API_KEY
  • AUDIO_TRANSCRIPTION_PROMPT, transcription instruction
  • WORKER_ID, default hostname
  • WORKER_HTTP_HOST, default 0.0.0.0
  • WORKER_HTTP_PORT, default 8081
  • WORKER_POLL_INTERVAL, default 2s
  • WORKER_CLAIM_LIMIT, default 4
  • WORKER_LEASE_TIMEOUT, default 15m

Current telephony pipeline

telephony now uses AI Service as the only AI execution path:

  1. transcription turns call audio into segments.
  2. transcript_summary creates a detailed Russian call summary.
  3. call_analysis runs tags and negotiation rules against the summary.
Description
No description provided
Readme 474 KiB
Languages
Go 98.7%
Shell 0.7%
Dockerfile 0.6%