Haikal Hilmi
Back
Client work100+ hours of video processed

In-Video Word Finder

Sole Engineer

Finds the exact moment a streamer says a target word (e.g. "LFG") for marketing clips — speech to searchable text via AI.

Delivers the exact clip + URL where each searched word is spoken.

How it works

Source

Kick.com

Extract

Video → Audio

Transcribe

Speech-to-Text AI

Index

Searchable text

Output

Word finder

Problem

A marketing client needed to catch specific words spoken across long Kick.com streams.

Solution

Transcribe with Whisper, index words with timestamps, then auto-cut the clip at that moment.

Key decisions

  • Ran Whisper (medium) locally on a laptop GPU — accurate enough for keyword search at zero STT API cost.
  • Batch-processed 100+ hours of video; indexed transcripts with timestamps to auto-cut clips.
  • Whisper's language detection handled mixed-language streams.

Tech stack

  • Python
  • Whisper
  • GPU