80 lines
2.7 KiB
Markdown
80 lines
2.7 KiB
Markdown
# AI Service
|
|
|
|
Technical AI job service for Portal workloads.
|
|
|
|
The first version owns only AI job lifecycle and metrics. Business data stays in
|
|
domain services such as `telephony`, `monitoring-tg` and `monitoring-pf`.
|
|
|
|
## Generic job contract
|
|
|
|
The service is intentionally domain-agnostic:
|
|
|
|
- `owner_service` names the caller, for example `telephony`, `monitoring-tg`,
|
|
`monitoring-pf` or a future Portal module.
|
|
- `owner_ref` is the caller's stable object reference, for example
|
|
`beeline/{call_id}` or `channel/{message_id}`.
|
|
- `task_type` describes the technical task class, for example
|
|
`transcribe`, `call_analysis`, `tg_analysis`, `pf_competitor_analysis`.
|
|
- `model_profile` selects a runtime profile, for example `whisperx`,
|
|
`qwen2.5-14b`, `vision`, or a future provider profile.
|
|
- `input` and `result` are JSON payloads owned by the caller and worker.
|
|
|
|
This keeps AI service as shared infrastructure rather than a telephony-specific
|
|
service.
|
|
|
|
## Built-in workers
|
|
|
|
The first built-in worker processes `llm_chat` and `chat_completion` jobs whose
|
|
`model_profile` equals `LLM_MODEL`.
|
|
|
|
Input can be either explicit messages:
|
|
|
|
```json
|
|
{
|
|
"messages": [
|
|
{"role": "system", "content": "Answer as JSON."},
|
|
{"role": "user", "content": "Classify this text"}
|
|
],
|
|
"max_tokens": 256
|
|
}
|
|
```
|
|
|
|
or compact `system` / `user` fields. The completed job result contains
|
|
`content`, `model`, `usage` and `duration_ms`.
|
|
|
|
## API
|
|
|
|
- `POST /api/v1/jobs` creates one job.
|
|
- `POST /api/v1/jobs/batch` creates many jobs with shared defaults.
|
|
- `POST /api/v1/jobs/claim` atomically claims pending jobs for a worker.
|
|
- `GET /api/v1/jobs/{id}` returns technical job state and result.
|
|
- `POST /api/v1/jobs/{id}/complete` stores a successful job result.
|
|
- `POST /api/v1/jobs/{id}/fail` stores a failed job category and message.
|
|
- `POST /api/v1/jobs/{id}/retry` resets failed/running jobs to `pending`.
|
|
- `GET /api/v1/stats` returns queue and error counters.
|
|
- `GET /api/v1/providers/status` checks configured AI providers without
|
|
returning secrets.
|
|
- `GET /healthz` returns process health.
|
|
- `GET /readyz` checks PostgreSQL readiness.
|
|
|
|
## Configuration
|
|
|
|
- `HTTP_HOST`, default `0.0.0.0`
|
|
- `HTTP_PORT`, default `8080`
|
|
- `DATABASE_URL`, required
|
|
- `MIGRATE_ON_START`, default `true`
|
|
- `LLM_BASE_URL`, primary OpenAI-compatible LLM endpoint
|
|
- `LLM_API_KEY`, primary LLM API key
|
|
- `LLM_MODEL`, default `qwen2.5-14b`
|
|
- `LLM_TIMEOUT`, default `5m`
|
|
- `WHISPERX_URL`, WhisperX endpoint for transcription jobs
|
|
- `WORKER_ID`, default hostname
|
|
- `WORKER_POLL_INTERVAL`, default `2s`
|
|
- `WORKER_CLAIM_LIMIT`, default `4`
|
|
|
|
## Next integration step
|
|
|
|
`telephony` should first mirror low-risk analysis jobs into this service while
|
|
continuing local processing. Remote execution can then be enabled by feature
|
|
flag per task type.
|