Client work100+ hours of video processed
In-Video Word Finder
Sole Engineer
Finds the exact moment a streamer says a target word (e.g. "LFG") for marketing clips — speech to searchable text via AI.
Delivers the exact clip + URL where each searched word is spoken.
How it works
Source
Kick.com
Extract
Video → Audio
Transcribe
Speech-to-Text AI
Index
Searchable text
Output
Word finder
Problem
A marketing client needed to catch specific words spoken across long Kick.com streams.
Solution
Transcribe with Whisper, index words with timestamps, then auto-cut the clip at that moment.
Key decisions
- Ran Whisper (medium) locally on a laptop GPU — accurate enough for keyword search at zero STT API cost.
- Batch-processed 100+ hours of video; indexed transcripts with timestamps to auto-cut clips.
- Whisper's language detection handled mixed-language streams.
Tech stack
- Python
- Whisper
- GPU