← Back to Studio

Origin Voice Studio

Documentation Hub

Inception Point AI

Origin Voice Studio (OVS) — Technical Architecture

by Inception Point AI

Version: Enterprise Production v4.0 — Monolithic Writer + Content Factory Platform: AWS g5.12xlarge | Ubuntu 24.04 | 4x NVIDIA A10G (96GB VRAM) | 186GB RAM | 3.5TB NVMe URL: https://ovs.inceptionpoint.ai Distribution: Spreaker → Apple Podcasts, Spotify, iHeartRadio, Amazon Music


Architecture Overview

OVS is a fully autonomous AI podcast production platform. It researches viral trends, forges franchise show bibles, casts unique voices via Hume AI, writes scripts via Claude Sonnet 4, renders audio across 4 parallel GPU workers, archives to AWS S3, and publishes to Spreaker for distribution across all podcast platforms.

The Pipeline (per episode)

Grok-3 (xAI)        → Viral trend analysis (X/Twitter cultural intelligence)
Brave Search         → Real-time grounding (news + articles)
Claude Sonnet 4      → Monolithic script writing (ONE call, full episode)
Style Injector       → Algorithmic speed/latency/overlap tags
4x F5-TTS (GPUs 0-3) → Parallel voice synthesis
AudioGen             → Sound effects generation
Hume AI TTS          → Narrator voice synthesis
Pedalboard           → Hollywood mastering chain
AWS S3               → Cloud archive + local cleanup
Spreaker API         → Podcast distribution (Apple, Spotify, etc.)
ChromaDB             → Series Bible (narrative continuity)

GPU Architecture

All 4 GPUs dedicated to audio production. No local LLM — all script writing via cloud APIs.

GPU 0 (A10G 24GB): Audio Worker A (:8001) — F5-TTS, AudioGen, MusicGen
GPU 1 (A10G 24GB): Audio Worker B (:8009) — F5-TTS, AudioGen, MusicGen
GPU 2 (A10G 24GB): Audio Worker C (:8010) — F5-TTS, AudioGen, MusicGen
GPU 3 (A10G 24GB): Audio Worker D (:8003) — F5-TTS, AudioGen, MusicGen

vLLM (Llama 70B) is available on GPUs 1+2 as fallback but disabled by default (vllm.service disabled). All primary inference routes through cloud APIs.


Services Map

Service Port Systemd Role
Frontend (Gradio) 7861 ovs Studio production UI
React Studio 3000 ovs-studio Casting, Library, Factory, Network
API Gateway 8080 ovs Unified FastAPI REST API
GPU Worker A 8001 ovs F5-TTS + AudioGen + MusicGen (GPU 0)
GPU Worker B 8009 ovs-gpu1 F5-TTS + AudioGen + MusicGen (GPU 1)
GPU Worker C 8010 ovs-gpu2 F5-TTS + AudioGen + MusicGen (GPU 2)
GPU Worker D 8003 ovs-gpu3 F5-TTS + AudioGen + MusicGen (GPU 3)
Mastering 8002 ovs Pedalboard DSP, ducking, reverb, limiter
Dialogue Coach 8004 ovs Text normalization for TTS
Redis 6379 redis-server Job queue + result backend

The Monolithic Writer

Replaced the 7-stage agent relay with a single GPT-4o/Sonnet call. Zero agent drift.

WRITER_MODEL (set in .env): sonnet (default), openai, opus, haiku, llama

ONE prompt → ONE call to Sonnet 4 → ONE complete episode script
                    ↓
         Style Injector (algorithmic)
         - Speed tags based on sentence analysis
         - Latency tags for pacing
         - Overlap tags for interruptions
         - Parenthetical stripping
                    ↓
         Audio Renderer (4 parallel GPUs)

Key features: - Narrator host introduction (genre-matched voice per franchise) - Internal monologue tags ([INTERNAL_MONOLOGUE: CHARACTER]) rendered with thought vocal chain - Diegetic framing devices (found footage, radio broadcast, interrogation, etc.) - Psychoacoustic foley (textural SFX descriptions) - Strict serialization via ChromaDB Series Bible - Smash cut outro with narrator closing


Content Factory

Parallel pipeline for maximum GPU utilization:

Script Factory Thread (Sonnet API) → Queue (max 10) → 4 Audio Worker Threads (GPUs 0-3)

Safety gates: - Queue backup: pauses script writing when queue > 6, resumes when drained - Render failure: halts if 10+ scripts written with 0 renders - TTS quality gate: halts on any CRITICAL RENDERER DROP - Auto-resume: script factory waits for audio to catch up, doesn't permanently halt

Throughput: ~0.65 episodes/minute, ~39/hour, ~935/day


AI Model Stack

Model Provider Purpose Cost
Claude Sonnet 4 Anthropic Script writing (default) ~$0.03/script
GPT-4o OpenAI Show forging, personas ~$0.04/call
Grok-3 xAI Viral trend analysis ~$0.003/call
DALL-E 3 OpenAI Character portraits + franchise thumbnails ~$0.04/image
Hume AI TTS Hume Voice design + narrator synthesis ~$0.02/voice
F5-TTS v1 Base Local (GPU) Voice cloning for all characters Free (GPU compute)
AudioGen Medium Local (GPU) SFX and ambient generation Free (GPU compute)
MusicGen Medium Local (GPU) Music score generation Free (GPU compute)
Brave Search Brave Real-time research grounding ~$0.001/query

Total cost per episode: ~$0.61


REST API

Base URL: https://ovs.inceptionpoint.ai/api Auth: X-API-Key header

Prefix Routes Purpose
/api/script generate, optimize, autopilot-run Script generation + optimization
/api/audio voices, design, render, render-direct, status Voice library + production
/api/personas CRUD + generate-portrait Digital Soul persona management
/api/dashboard metrics Telemetry + Spreaker stats
/api/library shows, episodes, audio streaming Media library + playback
/api/factory start, stop, status Content Factory control
/api/network forge, start, stop, status, trends Network Executive
/api/studio/autopilot create, list, start, pause, stop, force-run Autopilot fleet

Distribution Pipeline

Episode rendered → Library DB → S3 Archive → Spreaker API → RSS Feed → Apple/Spotify/etc.

Spreaker integration: - Auto-creates shows per franchise - Uploads with monetization-optimized metadata (tags, categories, explicit=false) - Injects DAI silence markers for mid-roll ad placement - Sets franchise thumbnails as cover art - Season/episode numbering for Apple Podcasts


File Structure

/opt/ovs/
├── .env                          API keys (Anthropic, OpenAI, xAI, Hume, Brave, Spreaker, AWS)
├── .api_keys                     OVS REST API keys
├── api/
│   ├── main.py                   Unified API Gateway (:8080)
│   ├── auth.py                   API key authentication
│   ├── schemas.py                Pydantic models
│   ├── routes_script.py          Script generation + optimization
│   ├── routes_audio.py           Voice library + production
│   ├── routes_persona.py         Digital Soul CRUD + portraits
│   ├── routes_dashboard.py       Telemetry + Spreaker stats
│   ├── routes_library.py         Media library + S3 streaming
│   ├── routes_factory.py         Content Factory control
│   ├── routes_network.py         Network Executive
│   └── routes_autopilot.py       Autopilot fleet management
├── core/
│   ├── audio_renderer.py         Batch TTS + timeline assembly + mastering
│   ├── audio_utils.py            Numpy audio helpers + seamless loop
│   ├── autopilot_scheduler.py    APScheduler-based episode scheduling
│   ├── content_factory.py        Parallel GPU production pipeline
│   ├── frontier_api.py           Multi-model router (Sonnet/GPT-4o/Grok/Opus)
│   ├── image_renderer.py         DALL-E 3 character portraits
│   ├── library_db.py             SQLite CMS (shows, episodes, costs)
│   ├── network_executive.py      Franchise forging + Hume casting + trend analysis
│   ├── psychology_prompt.py      Persona sliders → LLM behavioral rules
│   ├── s3_uploader.py            AWS S3 archive + local cleanup
│   ├── showrunner_daemon.py      Monolithic Writer + full pipeline orchestrator
│   ├── spreaker_publisher.py     Spreaker API publishing + monetization
│   ├── style_injector.py         Algorithmic speed/latency/overlap injection
│   ├── telemetry.py              Production metrics tracking
│   ├── thumbnail_renderer.py     DALL-E 3 franchise cover art
│   ├── voice_library.py          Voice file management
│   └── web_research.py           Brave Search + DuckDuckGo fallback
├── services/
│   ├── gpu_api.py                F5-TTS, AudioGen, MusicGen, Whisper (:8001/8003/8009/8010)
│   ├── mastering_api.py          Pedalboard DSP chain (:8002)
│   ├── memory_engine.py          ChromaDB Series Bible
│   ├── parser.py                 Script parser (brackets, bare names, internal monologue)
│   ├── persona_db.py             Digital Soul 5-layer persona CRUD
│   └── translator_api.py         Dialogue Coach (:8004)
├── frontend/
│   ├── app.py                    Gradio Studio UI
│   ├── casting.py                Gradio Casting (legacy)
│   └── api_client.py             HTTP client for API gateway
├── studio/                       Next.js React app
│   └── src/app/
│       ├── casting/page.tsx      Director's Binder (voices + personas)
│       ├── library/page.tsx      Media Library (browse + play + download)
│       ├── factory/page.tsx      Content Factory (GPU pipeline control)
│       └── network/page.tsx      Network Executive (forge + trends)
├── db/
│   ├── chroma_lore/              ChromaDB Series Bible storage
│   ├── style_vault/              Gold standard script examples
│   ├── network_state.json        Active franchises + episode counts
│   ├── factory_state.json        Factory runtime state
│   ├── personas.db               Digital Soul persona database
│   └── spreaker_shows.json       Spreaker show ID cache
├── portraits/                    DALL-E character portraits
├── thumbnails/                   DALL-E franchise cover art
├── voices/                       Voice reference WAVs (24kHz mono)
├── logs/                         Service logs + library.db + telemetry.db
└── docs/                         HTML documentation + dashboard

Operations

Start/Stop

systemctl start ovs              # All OVS services + API Gateway
systemctl start ovs-studio       # React UI
systemctl start ovs-gpu1         # GPU Worker B
systemctl start ovs-gpu2         # GPU Worker C
systemctl start ovs-gpu3         # GPU Worker D

Site Map

https://ovs.inceptionpoint.ai/                    Gradio Studio
https://ovs.inceptionpoint.ai/studio/casting       React Casting Office
https://ovs.inceptionpoint.ai/studio/library       React Media Library
https://ovs.inceptionpoint.ai/studio/factory       React Content Factory
https://ovs.inceptionpoint.ai/docs/                Documentation Hub
https://ovs.inceptionpoint.ai/docs/dashboard.html  Executive Dashboard
https://ovs.inceptionpoint.ai/api-docs             Swagger API Docs

Origin Voice Studio (OVS) by Inception Point AI Architecture v4.0 — March 2026 g5.12xlarge | 4x A10G | Sonnet 4 + GPT-4o + Grok-3 | F5-TTS | Spreaker Distribution