Version: Enterprise Production v4.0 — Monolithic Writer + Content Factory Platform: AWS g5.12xlarge | Ubuntu 24.04 | 4x NVIDIA A10G (96GB VRAM) | 186GB RAM | 3.5TB NVMe URL: https://ovs.inceptionpoint.ai Distribution: Spreaker → Apple Podcasts, Spotify, iHeartRadio, Amazon Music
OVS is a fully autonomous AI podcast production platform. It researches viral trends, forges franchise show bibles, casts unique voices via Hume AI, writes scripts via Claude Sonnet 4, renders audio across 4 parallel GPU workers, archives to AWS S3, and publishes to Spreaker for distribution across all podcast platforms.
Grok-3 (xAI) → Viral trend analysis (X/Twitter cultural intelligence)
Brave Search → Real-time grounding (news + articles)
Claude Sonnet 4 → Monolithic script writing (ONE call, full episode)
Style Injector → Algorithmic speed/latency/overlap tags
4x F5-TTS (GPUs 0-3) → Parallel voice synthesis
AudioGen → Sound effects generation
Hume AI TTS → Narrator voice synthesis
Pedalboard → Hollywood mastering chain
AWS S3 → Cloud archive + local cleanup
Spreaker API → Podcast distribution (Apple, Spotify, etc.)
ChromaDB → Series Bible (narrative continuity)
All 4 GPUs dedicated to audio production. No local LLM — all script writing via cloud APIs.
GPU 0 (A10G 24GB): Audio Worker A (:8001) — F5-TTS, AudioGen, MusicGen
GPU 1 (A10G 24GB): Audio Worker B (:8009) — F5-TTS, AudioGen, MusicGen
GPU 2 (A10G 24GB): Audio Worker C (:8010) — F5-TTS, AudioGen, MusicGen
GPU 3 (A10G 24GB): Audio Worker D (:8003) — F5-TTS, AudioGen, MusicGen
vLLM (Llama 70B) is available on GPUs 1+2 as fallback but disabled by default (vllm.service disabled). All primary inference routes through cloud APIs.
| Service | Port | Systemd | Role |
|---|---|---|---|
| Frontend (Gradio) | 7861 | ovs | Studio production UI |
| React Studio | 3000 | ovs-studio | Casting, Library, Factory, Network |
| API Gateway | 8080 | ovs | Unified FastAPI REST API |
| GPU Worker A | 8001 | ovs | F5-TTS + AudioGen + MusicGen (GPU 0) |
| GPU Worker B | 8009 | ovs-gpu1 | F5-TTS + AudioGen + MusicGen (GPU 1) |
| GPU Worker C | 8010 | ovs-gpu2 | F5-TTS + AudioGen + MusicGen (GPU 2) |
| GPU Worker D | 8003 | ovs-gpu3 | F5-TTS + AudioGen + MusicGen (GPU 3) |
| Mastering | 8002 | ovs | Pedalboard DSP, ducking, reverb, limiter |
| Dialogue Coach | 8004 | ovs | Text normalization for TTS |
| Redis | 6379 | redis-server | Job queue + result backend |
Replaced the 7-stage agent relay with a single GPT-4o/Sonnet call. Zero agent drift.
WRITER_MODEL (set in .env): sonnet (default), openai, opus, haiku, llama
ONE prompt → ONE call to Sonnet 4 → ONE complete episode script
↓
Style Injector (algorithmic)
- Speed tags based on sentence analysis
- Latency tags for pacing
- Overlap tags for interruptions
- Parenthetical stripping
↓
Audio Renderer (4 parallel GPUs)
Key features:
- Narrator host introduction (genre-matched voice per franchise)
- Internal monologue tags ([INTERNAL_MONOLOGUE: CHARACTER]) rendered with thought vocal chain
- Diegetic framing devices (found footage, radio broadcast, interrogation, etc.)
- Psychoacoustic foley (textural SFX descriptions)
- Strict serialization via ChromaDB Series Bible
- Smash cut outro with narrator closing
Parallel pipeline for maximum GPU utilization:
Script Factory Thread (Sonnet API) → Queue (max 10) → 4 Audio Worker Threads (GPUs 0-3)
Safety gates:
- Queue backup: pauses script writing when queue > 6, resumes when drained
- Render failure: halts if 10+ scripts written with 0 renders
- TTS quality gate: halts on any CRITICAL RENDERER DROP
- Auto-resume: script factory waits for audio to catch up, doesn't permanently halt
Throughput: ~0.65 episodes/minute, ~39/hour, ~935/day
| Model | Provider | Purpose | Cost |
|---|---|---|---|
| Claude Sonnet 4 | Anthropic | Script writing (default) | ~$0.03/script |
| GPT-4o | OpenAI | Show forging, personas | ~$0.04/call |
| Grok-3 | xAI | Viral trend analysis | ~$0.003/call |
| DALL-E 3 | OpenAI | Character portraits + franchise thumbnails | ~$0.04/image |
| Hume AI TTS | Hume | Voice design + narrator synthesis | ~$0.02/voice |
| F5-TTS v1 Base | Local (GPU) | Voice cloning for all characters | Free (GPU compute) |
| AudioGen Medium | Local (GPU) | SFX and ambient generation | Free (GPU compute) |
| MusicGen Medium | Local (GPU) | Music score generation | Free (GPU compute) |
| Brave Search | Brave | Real-time research grounding | ~$0.001/query |
Total cost per episode: ~$0.61
Base URL: https://ovs.inceptionpoint.ai/api
Auth: X-API-Key header
| Prefix | Routes | Purpose |
|---|---|---|
/api/script |
generate, optimize, autopilot-run | Script generation + optimization |
/api/audio |
voices, design, render, render-direct, status | Voice library + production |
/api/personas |
CRUD + generate-portrait | Digital Soul persona management |
/api/dashboard |
metrics | Telemetry + Spreaker stats |
/api/library |
shows, episodes, audio streaming | Media library + playback |
/api/factory |
start, stop, status | Content Factory control |
/api/network |
forge, start, stop, status, trends | Network Executive |
/api/studio/autopilot |
create, list, start, pause, stop, force-run | Autopilot fleet |
Episode rendered → Library DB → S3 Archive → Spreaker API → RSS Feed → Apple/Spotify/etc.
Spreaker integration: - Auto-creates shows per franchise - Uploads with monetization-optimized metadata (tags, categories, explicit=false) - Injects DAI silence markers for mid-roll ad placement - Sets franchise thumbnails as cover art - Season/episode numbering for Apple Podcasts
/opt/ovs/
├── .env API keys (Anthropic, OpenAI, xAI, Hume, Brave, Spreaker, AWS)
├── .api_keys OVS REST API keys
├── api/
│ ├── main.py Unified API Gateway (:8080)
│ ├── auth.py API key authentication
│ ├── schemas.py Pydantic models
│ ├── routes_script.py Script generation + optimization
│ ├── routes_audio.py Voice library + production
│ ├── routes_persona.py Digital Soul CRUD + portraits
│ ├── routes_dashboard.py Telemetry + Spreaker stats
│ ├── routes_library.py Media library + S3 streaming
│ ├── routes_factory.py Content Factory control
│ ├── routes_network.py Network Executive
│ └── routes_autopilot.py Autopilot fleet management
├── core/
│ ├── audio_renderer.py Batch TTS + timeline assembly + mastering
│ ├── audio_utils.py Numpy audio helpers + seamless loop
│ ├── autopilot_scheduler.py APScheduler-based episode scheduling
│ ├── content_factory.py Parallel GPU production pipeline
│ ├── frontier_api.py Multi-model router (Sonnet/GPT-4o/Grok/Opus)
│ ├── image_renderer.py DALL-E 3 character portraits
│ ├── library_db.py SQLite CMS (shows, episodes, costs)
│ ├── network_executive.py Franchise forging + Hume casting + trend analysis
│ ├── psychology_prompt.py Persona sliders → LLM behavioral rules
│ ├── s3_uploader.py AWS S3 archive + local cleanup
│ ├── showrunner_daemon.py Monolithic Writer + full pipeline orchestrator
│ ├── spreaker_publisher.py Spreaker API publishing + monetization
│ ├── style_injector.py Algorithmic speed/latency/overlap injection
│ ├── telemetry.py Production metrics tracking
│ ├── thumbnail_renderer.py DALL-E 3 franchise cover art
│ ├── voice_library.py Voice file management
│ └── web_research.py Brave Search + DuckDuckGo fallback
├── services/
│ ├── gpu_api.py F5-TTS, AudioGen, MusicGen, Whisper (:8001/8003/8009/8010)
│ ├── mastering_api.py Pedalboard DSP chain (:8002)
│ ├── memory_engine.py ChromaDB Series Bible
│ ├── parser.py Script parser (brackets, bare names, internal monologue)
│ ├── persona_db.py Digital Soul 5-layer persona CRUD
│ └── translator_api.py Dialogue Coach (:8004)
├── frontend/
│ ├── app.py Gradio Studio UI
│ ├── casting.py Gradio Casting (legacy)
│ └── api_client.py HTTP client for API gateway
├── studio/ Next.js React app
│ └── src/app/
│ ├── casting/page.tsx Director's Binder (voices + personas)
│ ├── library/page.tsx Media Library (browse + play + download)
│ ├── factory/page.tsx Content Factory (GPU pipeline control)
│ └── network/page.tsx Network Executive (forge + trends)
├── db/
│ ├── chroma_lore/ ChromaDB Series Bible storage
│ ├── style_vault/ Gold standard script examples
│ ├── network_state.json Active franchises + episode counts
│ ├── factory_state.json Factory runtime state
│ ├── personas.db Digital Soul persona database
│ └── spreaker_shows.json Spreaker show ID cache
├── portraits/ DALL-E character portraits
├── thumbnails/ DALL-E franchise cover art
├── voices/ Voice reference WAVs (24kHz mono)
├── logs/ Service logs + library.db + telemetry.db
└── docs/ HTML documentation + dashboard
systemctl start ovs # All OVS services + API Gateway
systemctl start ovs-studio # React UI
systemctl start ovs-gpu1 # GPU Worker B
systemctl start ovs-gpu2 # GPU Worker C
systemctl start ovs-gpu3 # GPU Worker D
https://ovs.inceptionpoint.ai/ Gradio Studio
https://ovs.inceptionpoint.ai/studio/casting React Casting Office
https://ovs.inceptionpoint.ai/studio/library React Media Library
https://ovs.inceptionpoint.ai/studio/factory React Content Factory
https://ovs.inceptionpoint.ai/docs/ Documentation Hub
https://ovs.inceptionpoint.ai/docs/dashboard.html Executive Dashboard
https://ovs.inceptionpoint.ai/api-docs Swagger API Docs
Origin Voice Studio (OVS) by Inception Point AI Architecture v4.0 — March 2026 g5.12xlarge | 4x A10G | Sonnet 4 + GPT-4o + Grok-3 | F5-TTS | Spreaker Distribution