Pipeline architecture

How a Twitch broadcast becomes a TikTok clip

Six stages, all automated. Each broadcast you stream goes through the same path; you only see the final published clip.

Capture from Twitch

The recorder polls the Twitch Helix API for each channel registered in your dashboard. When a stream goes live, streamlink starts pulling the broadcast at the highest quality the source provides and chunks it into 12-minute MPEG-TS segments stored on disk.

MPEG-TS was chosen because it tolerates partial reads — if a segment is cut short by a Twitch disconnect or a server reboot, ffprobe can still read its resolution and duration, and the rest of the pipeline runs on what was captured. There is no "all or nothing" failure mode.

Transcribe with Whisper

Each completed segment is passed to faster-whisper (small model). Output is word-level: each word gets a start and end timestamp accurate to within 50ms, plus a confidence score. Transcription runs locally on the server — speech audio never leaves the host.

Streamer language is configured in the channel settings: ru for Russian-speaking creators, en for English. Whisper accepts the language hint and produces tighter results than auto-detect.

Score every window

A sliding window over the transcript ranks every potential 15–55s clip on a weighted sum of signals:

Emotion-keyword density — bilingual dictionary of high-energy words (excitement, surprise, anger, frustration).
Boring-keyword penalty — fillers, donation read-outs, opening greetings, low-information chatter.
Peak loudness — RMS audio analysis aligned to the transcript window.
Hook-phrase bonus — recognized opening hooks ("look at this", "no way", "wait, what?").
Anti-music penalty — long sustained-tone runs that score loud but aren't speech.
Short-burst bonus — sub-second exclamations that read well in short-form.

Each window's score is divided by 1.5 and clamped to a virality 0–100. Anything below the absolute threshold of 10 is dropped. The pipeline never publishes a clip purely because it was the "best of the worst" — boring segments produce zero clips and that's the desired outcome.

Quality gate via DeepSeek

The top-scored windows are sent (text only) to DeepSeek with a prompt asking for a 1–10 engagement rating. Anything that scores below 6 is skipped from rendering, no matter how the local scorer ranked it. This catches segments where the speech is technically present but not actually interesting — a content judgment the keyword scorer can't make.

Sending only the transcript text (not audio or video) keeps the data footprint tiny, the request fast, and the privacy boundary tight.

Render the vertical clip

For each surviving window, ffmpeg builds a 1080×1920 vertical render in a single pass:

Layout filter — gameplay-only, webcam-only, or 40/60 split, per the streamer's config.
Karaoke ASS subtitle burn-in — one yellow word, scaled 125%, Liberation Sans 72pt.
Hook-title overlay — DeepSeek-generated headline with fade-out (5s or 40% of clip).
Audio loudness normalization — EBU R128 to −16 LUFS plus soft-knee compressor.
Profanity censoring — detected swearing silenced in audio, masked with stars in subs.
Subscribe outro — 5s purple "ПОДПИШИСЬ" CTA card with a green channel link.
Progress bar overlay — thin Twitch-purple line growing across the bottom edge.
Video fade — 0.4s in / 0.8s out for a clean opening and exit.
Encoding — CRF 23, 160 kbps audio, automatic fallback to CRF 28 if >45 MB.

Caption & publish

The same DeepSeek call that wrote the hook title also produced a 2–3 sentence caption and 4–6 thematic hashtags. The clip plus this metadata is sent to Postiz, which then forwards to each connected platform via the official APIs:

TikTok — Content Posting API (Direct Post or Upload).
YouTube Shorts — YouTube Data API v3.
Instagram Reels — Instagram Graph API.

A copy of the rendered clip is also pushed to a Telegram channel of your choice — useful for previewing the queue, archiving, or sharing manually before auto-publish kicks in.

Stack & deployment

The Service runs as a Docker Compose stack on a single VPS in Helsinki, EU. Seven containers cover the full pipeline.

Python 3.11

Recorder logic

Streamlink

Twitch capture

FFmpeg

Render & encode

faster-whisper

Transcription

DeepSeek

Quality gate & titles

Postiz

Cross-platform OAuth

PostgreSQL

State store

nginx + Let's Encrypt

TLS termination

Data path & retention

Source recordings live on the server only as long as needed to extract the highlights — typically under one hour. Once a clip is rendered and accepted by the destination platform, the source segment is deleted. Generated clips are kept for up to 7 days for re-publishing or debugging, then auto-pruned.

OAuth tokens are stored encrypted at rest in the Postiz database. They are deleted within 24 hours of the user disconnecting an integration. Full retention rules are listed in the privacy policy.

Curious which streamers benefit most?

The use cases page maps the pipeline to four distinct creator profiles.

View use cases Read the changelog