Origin Voice Studio

Documentation Hub

Origin Voice Studio (OVS) — Technical Architecture

by Inception Point AI

Version: Enterprise Production v4.0 — Monolithic Writer + Content Factory Platform: AWS g5.12xlarge | Ubuntu 24.04 | 4x NVIDIA A10G (96GB VRAM) | 186GB RAM | 3.5TB NVMe URL: https://ovs.inceptionpoint.ai Distribution: Spreaker → Apple Podcasts, Spotify, iHeartRadio, Amazon Music

Architecture Overview

OVS is a fully autonomous AI podcast production platform. It researches viral trends, forges franchise show bibles, casts unique voices via Hume AI, writes scripts via Claude Sonnet 4, renders audio across 4 parallel GPU workers, archives to AWS S3, and publishes to Spreaker for distribution across all podcast platforms.

The Pipeline (per episode)

Grok-3 (xAI)        → Viral trend analysis (X/Twitter cultural intelligence)
Brave Search         → Real-time grounding (news + articles)
Claude Sonnet 4      → Monolithic script writing (ONE call, full episode)
Style Injector       → Algorithmic speed/latency/overlap tags
4x F5-TTS (GPUs 0-3) → Parallel voice synthesis
AudioGen             → Sound effects generation
Hume AI TTS          → Narrator voice synthesis
Pedalboard           → Hollywood mastering chain
AWS S3               → Cloud archive + local cleanup
Spreaker API         → Podcast distribution (Apple, Spotify, etc.)
ChromaDB             → Series Bible (narrative continuity)

GPU Architecture

All 4 GPUs dedicated to audio production. No local LLM — all script writing via cloud APIs.

GPU 0 (A10G 24GB): Audio Worker A (:8001) — F5-TTS, AudioGen, MusicGen
GPU 1 (A10G 24GB): Audio Worker B (:8009) — F5-TTS, AudioGen, MusicGen
GPU 2 (A10G 24GB): Audio Worker C (:8010) — F5-TTS, AudioGen, MusicGen
GPU 3 (A10G 24GB): Audio Worker D (:8003) — F5-TTS, AudioGen, MusicGen

vLLM (Llama 70B) is available on GPUs 1+2 as fallback but disabled by default (vllm.service disabled). All primary inference routes through cloud APIs.

Services Map

Service	Port	Systemd	Role
Frontend (Gradio)	7861	ovs	Studio production UI
React Studio	3000	ovs-studio	Casting, Library, Factory, Network
API Gateway	8080	ovs	Unified FastAPI REST API
GPU Worker A	8001	ovs	F5-TTS + AudioGen + MusicGen (GPU 0)
GPU Worker B	8009	ovs-gpu1	F5-TTS + AudioGen + MusicGen (GPU 1)
GPU Worker C	8010	ovs-gpu2	F5-TTS + AudioGen + MusicGen (GPU 2)
GPU Worker D	8003	ovs-gpu3	F5-TTS + AudioGen + MusicGen (GPU 3)
Mastering	8002	ovs	Pedalboard DSP, ducking, reverb, limiter
Dialogue Coach	8004	ovs	Text normalization for TTS
Redis	6379	redis-server	Job queue + result backend

The Monolithic Writer

Replaced the 7-stage agent relay with a single GPT-4o/Sonnet call. Zero agent drift.

WRITER_MODEL (set in .env): sonnet (default), openai, opus, haiku, llama

ONE prompt → ONE call to Sonnet 4 → ONE complete episode script
                    ↓
         Style Injector (algorithmic)
         - Speed tags based on sentence analysis
         - Latency tags for pacing
         - Overlap tags for interruptions
         - Parenthetical stripping
                    ↓
         Audio Renderer (4 parallel GPUs)

Key features: - Narrator host introduction (genre-matched voice per franchise) - Internal monologue tags ([INTERNAL_MONOLOGUE: CHARACTER]) rendered with thought vocal chain - Diegetic framing devices (found footage, radio broadcast, interrogation, etc.) - Psychoacoustic foley (textural SFX descriptions) - Strict serialization via ChromaDB Series Bible - Smash cut outro with narrator closing

Content Factory

Parallel pipeline for maximum GPU utilization:

Script Factory Thread (Sonnet API) → Queue (max 10) → 4 Audio Worker Threads (GPUs 0-3)

Safety gates: - Queue backup: pauses script writing when queue > 6, resumes when drained - Render failure: halts if 10+ scripts written with 0 renders - TTS quality gate: halts on any CRITICAL RENDERER DROP - Auto-resume: script factory waits for audio to catch up, doesn't permanently halt

Throughput: ~0.65 episodes/minute, ~39/hour, ~935/day

AI Model Stack

Model	Provider	Purpose	Cost
Claude Sonnet 4	Anthropic	Script writing (default)	~$0.03/script
GPT-4o	OpenAI	Show forging, personas	~$0.04/call
Grok-3	xAI	Viral trend analysis	~$0.003/call
DALL-E 3	OpenAI	Character portraits + franchise thumbnails	~$0.04/image
Hume AI TTS	Hume	Voice design + narrator synthesis	~$0.02/voice
F5-TTS v1 Base	Local (GPU)	Voice cloning for all characters	Free (GPU compute)
AudioGen Medium	Local (GPU)	SFX and ambient generation	Free (GPU compute)
MusicGen Medium	Local (GPU)	Music score generation	Free (GPU compute)
Brave Search	Brave	Real-time research grounding	~$0.001/query

Total cost per episode: ~$0.61

REST API

Base URL: https://ovs.inceptionpoint.ai/api Auth: X-API-Key header

Prefix	Routes	Purpose
`/api/script`	generate, optimize, autopilot-run	Script generation + optimization
`/api/audio`	voices, design, render, render-direct, status	Voice library + production
`/api/personas`	CRUD + generate-portrait	Digital Soul persona management
`/api/dashboard`	metrics	Telemetry + Spreaker stats
`/api/library`	shows, episodes, audio streaming	Media library + playback
`/api/factory`	start, stop, status	Content Factory control
`/api/network`	forge, start, stop, status, trends	Network Executive
`/api/studio/autopilot`	create, list, start, pause, stop, force-run	Autopilot fleet

Distribution Pipeline

Episode rendered → Library DB → S3 Archive → Spreaker API → RSS Feed → Apple/Spotify/etc.

Spreaker integration: - Auto-creates shows per franchise - Uploads with monetization-optimized metadata (tags, categories, explicit=false) - Injects DAI silence markers for mid-roll ad placement - Sets franchise thumbnails as cover art - Season/episode numbering for Apple Podcasts

File Structure

/opt/ovs/
├── .env                          API keys (Anthropic, OpenAI, xAI, Hume, Brave, Spreaker, AWS)
├── .api_keys                     OVS REST API keys
├── api/
│   ├── main.py                   Unified API Gateway (:8080)
│   ├── auth.py                   API key authentication
│   ├── schemas.py                Pydantic models
│   ├── routes_script.py          Script generation + optimization
│   ├── routes_audio.py           Voice library + production
│   ├── routes_persona.py         Digital Soul CRUD + portraits
│   ├── routes_dashboard.py       Telemetry + Spreaker stats
│   ├── routes_library.py         Media library + S3 streaming
│   ├── routes_factory.py         Content Factory control
│   ├── routes_network.py         Network Executive
│   └── routes_autopilot.py       Autopilot fleet management
├── core/
│   ├── audio_renderer.py         Batch TTS + timeline assembly + mastering
│   ├── audio_utils.py            Numpy audio helpers + seamless loop
│   ├── autopilot_scheduler.py    APScheduler-based episode scheduling
│   ├── content_factory.py        Parallel GPU production pipeline
│   ├── frontier_api.py           Multi-model router (Sonnet/GPT-4o/Grok/Opus)
│   ├── image_renderer.py         DALL-E 3 character portraits
│   ├── library_db.py             SQLite CMS (shows, episodes, costs)
│   ├── network_executive.py      Franchise forging + Hume casting + trend analysis
│   ├── psychology_prompt.py      Persona sliders → LLM behavioral rules
│   ├── s3_uploader.py            AWS S3 archive + local cleanup
│   ├── showrunner_daemon.py      Monolithic Writer + full pipeline orchestrator
│   ├── spreaker_publisher.py     Spreaker API publishing + monetization
│   ├── style_injector.py         Algorithmic speed/latency/overlap injection
│   ├── telemetry.py              Production metrics tracking
│   ├── thumbnail_renderer.py     DALL-E 3 franchise cover art
│   ├── voice_library.py          Voice file management
│   └── web_research.py           Brave Search + DuckDuckGo fallback
├── services/
│   ├── gpu_api.py                F5-TTS, AudioGen, MusicGen, Whisper (:8001/8003/8009/8010)
│   ├── mastering_api.py          Pedalboard DSP chain (:8002)
│   ├── memory_engine.py          ChromaDB Series Bible
│   ├── parser.py                 Script parser (brackets, bare names, internal monologue)
│   ├── persona_db.py             Digital Soul 5-layer persona CRUD
│   └── translator_api.py         Dialogue Coach (:8004)
├── frontend/
│   ├── app.py                    Gradio Studio UI
│   ├── casting.py                Gradio Casting (legacy)
│   └── api_client.py             HTTP client for API gateway
├── studio/                       Next.js React app
│   └── src/app/
│       ├── casting/page.tsx      Director's Binder (voices + personas)
│       ├── library/page.tsx      Media Library (browse + play + download)
│       ├── factory/page.tsx      Content Factory (GPU pipeline control)
│       └── network/page.tsx      Network Executive (forge + trends)
├── db/
│   ├── chroma_lore/              ChromaDB Series Bible storage
│   ├── style_vault/              Gold standard script examples
│   ├── network_state.json        Active franchises + episode counts
│   ├── factory_state.json        Factory runtime state
│   ├── personas.db               Digital Soul persona database
│   └── spreaker_shows.json       Spreaker show ID cache
├── portraits/                    DALL-E character portraits
├── thumbnails/                   DALL-E franchise cover art
├── voices/                       Voice reference WAVs (24kHz mono)
├── logs/                         Service logs + library.db + telemetry.db
└── docs/                         HTML documentation + dashboard

Operations

Start/Stop

systemctl start ovs              # All OVS services + API Gateway
systemctl start ovs-studio       # React UI
systemctl start ovs-gpu1         # GPU Worker B
systemctl start ovs-gpu2         # GPU Worker C
systemctl start ovs-gpu3         # GPU Worker D

Site Map

https://ovs.inceptionpoint.ai/                    Gradio Studio
https://ovs.inceptionpoint.ai/studio/casting       React Casting Office
https://ovs.inceptionpoint.ai/studio/library       React Media Library
https://ovs.inceptionpoint.ai/studio/factory       React Content Factory
https://ovs.inceptionpoint.ai/docs/                Documentation Hub
https://ovs.inceptionpoint.ai/docs/dashboard.html  Executive Dashboard
https://ovs.inceptionpoint.ai/api-docs             Swagger API Docs

Origin Voice Studio (OVS) by Inception Point AI Architecture v4.0 — March 2026 g5.12xlarge | 4x A10G | Sonnet 4 + GPT-4o + Grok-3 | F5-TTS | Spreaker Distribution