Six stages, all automated. Each broadcast you stream goes through the same path; you only see the final published clip.
The recorder polls the Twitch Helix API for each channel registered in your dashboard. When a stream goes live, streamlink starts pulling the broadcast at the highest quality the source provides and chunks it into 12-minute MPEG-TS segments stored on disk.
MPEG-TS was chosen because it tolerates partial reads — if a segment is cut short by a Twitch disconnect or a server reboot, ffprobe can still read its resolution and duration, and the rest of the pipeline runs on what was captured. There is no "all or nothing" failure mode.
Each completed segment is passed to faster-whisper (small model). Output is word-level: each word gets a start and end timestamp accurate to within 50ms, plus a confidence score. Transcription runs locally on the server — speech audio never leaves the host.
Streamer language is configured in the channel settings: ru for Russian-speaking creators, en for English. Whisper accepts the language hint and produces tighter results than auto-detect.
A sliding window over the transcript ranks every potential 15–55s clip on a weighted sum of signals:
Each window's score is divided by 1.5 and clamped to a virality 0–100. Anything below the absolute threshold of 10 is dropped. The pipeline never publishes a clip purely because it was the "best of the worst" — boring segments produce zero clips and that's the desired outcome.
The top-scored windows are sent (text only) to DeepSeek with a prompt asking for a 1–10 engagement rating. Anything that scores below 6 is skipped from rendering, no matter how the local scorer ranked it. This catches segments where the speech is technically present but not actually interesting — a content judgment the keyword scorer can't make.
Sending only the transcript text (not audio or video) keeps the data footprint tiny, the request fast, and the privacy boundary tight.
For each surviving window, ffmpeg builds a 1080×1920 vertical render in a single pass:
The same DeepSeek call that wrote the hook title also produced a 2–3 sentence caption and 4–6 thematic hashtags. The clip plus this metadata is sent to Postiz, which then forwards to each connected platform via the official APIs:
A copy of the rendered clip is also pushed to a Telegram channel of your choice — useful for previewing the queue, archiving, or sharing manually before auto-publish kicks in.
The Service runs as a Docker Compose stack on a single VPS in Helsinki, EU. Seven containers cover the full pipeline.
Source recordings live on the server only as long as needed to extract the highlights — typically under one hour. Once a clip is rendered and accepted by the destination platform, the source segment is deleted. Generated clips are kept for up to 7 days for re-publishing or debugging, then auto-pruned.
OAuth tokens are stored encrypted at rest in the Postiz database. They are deleted within 24 hours of the user disconnecting an integration. Full retention rules are listed in the privacy policy.
The use cases page maps the pipeline to four distinct creator profiles.